Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts
- URL: http://arxiv.org/abs/2407.02075v1
- Date: Tue, 2 Jul 2024 09:08:06 GMT
- Title: Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts
- Authors: Pasquale De Marinis, Nicola Fanelli, Raffaele Scaringi, Emanuele Colonna, Giuseppe Fiameni, Gennaro Vessio, Giovanna Castellano,
- Abstract summary: We present Label Anything, an innovative neural network architecture designed for few-shot semantic segmentation (FSS)
Label Anything demonstrates remarkable generalizability across multiple classes with minimal examples required per class.
Our comprehensive experimental validation, particularly achieving state-of-the-art results on the COCO-$20i$ benchmark, underscores Label Anything's robust generalization and flexibility.
- Score: 10.262029691744921
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Label Anything, an innovative neural network architecture designed for few-shot semantic segmentation (FSS) that demonstrates remarkable generalizability across multiple classes with minimal examples required per class. Diverging from traditional FSS methods that predominantly rely on masks for annotating support images, Label Anything introduces varied visual prompts -- points, bounding boxes, and masks -- thereby enhancing the framework's versatility and adaptability. Unique to our approach, Label Anything is engineered for end-to-end training across multi-class FSS scenarios, efficiently learning from diverse support set configurations without retraining. This approach enables a "universal" application to various FSS challenges, ranging from $1$-way $1$-shot to complex $N$-way $K$-shot configurations while remaining agnostic to the specific number of class examples. This innovative training strategy reduces computational requirements and substantially improves the model's adaptability and generalization across diverse segmentation tasks. Our comprehensive experimental validation, particularly achieving state-of-the-art results on the COCO-$20^i$ benchmark, underscores Label Anything's robust generalization and flexibility. The source code is publicly available at: https://github.com/pasqualedem/LabelAnything.
Related papers
- Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation [67.35274834837064]
We develop a universal vision-language framework (UniFSS) to integrate prompts from text, mask, box, and image.
UniFSS significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-07-16T08:41:01Z) - CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing [66.6712018832575]
Domain generalization (DG) based Face Anti-Spoofing (FAS) aims to improve the model's performance on unseen domains.
We make use of large-scale VLMs like CLIP and leverage the textual feature to dynamically adjust the classifier's weights for exploring generalizable visual features.
arXiv Detail & Related papers (2024-03-21T11:58:50Z) - In-Context Learning for Extreme Multi-Label Classification [29.627891261947536]
Multi-label classification problems with thousands of classes are hard to solve with in-context learning alone.
We propose a general program that defines multi-step interactions between LMs and retrievers to efficiently tackle such problems.
Our solution requires no finetuning, is easily applicable to new tasks, alleviates prompt engineering, and requires only tens of labeled examples.
arXiv Detail & Related papers (2024-01-22T18:09:52Z) - Masked Cross-image Encoding for Few-shot Segmentation [16.445813548503708]
Few-shot segmentation (FSS) is a dense prediction task that aims to infer the pixel-wise labels of unseen classes using only a limited number of annotated images.
We propose a joint learning method termed Masked Cross-Image MCE, which is designed to capture common visual properties that describe object details and to learn bidirectional inter-image dependencies that enhance feature interaction.
arXiv Detail & Related papers (2023-08-22T05:36:39Z) - Visual and Textual Prior Guided Mask Assemble for Few-Shot Segmentation
and Beyond [0.0]
We propose a visual and textual Prior Guided Mask Assemble Network (PGMA-Net)
It employs a class-agnostic mask assembly process to alleviate the bias, and formulates diverse tasks into a unified manner by assembling the prior through affinity.
It achieves new state-of-the-art results in the FSS task, with mIoU of $77.6$ on $textPASCAL-5i$ and $59.4$ on $textCOCO-20i$ in 1-shot scenario.
arXiv Detail & Related papers (2023-08-15T02:46:49Z) - Reliable Representations Learning for Incomplete Multi-View Partial Multi-Label Classification [78.15629210659516]
In this paper, we propose an incomplete multi-view partial multi-label classification network named RANK.
We break through the view-level weights inherent in existing methods and propose a quality-aware sub-network to dynamically assign quality scores to each view of each sample.
Our model is not only able to handle complete multi-view multi-label datasets, but also works on datasets with missing instances and labels.
arXiv Detail & Related papers (2023-03-30T03:09:25Z) - APANet: Adaptive Prototypes Alignment Network for Few-Shot Semantic
Segmentation [56.387647750094466]
Few-shot semantic segmentation aims to segment novel-class objects in a given query image with only a few labeled support images.
Most advanced solutions exploit a metric learning framework that performs segmentation through matching each query feature to a learned class-specific prototype.
We present an adaptive prototype representation by introducing class-specific and class-agnostic prototypes.
arXiv Detail & Related papers (2021-11-24T04:38:37Z) - Generative Multi-Label Zero-Shot Learning [136.17594611722285]
Multi-label zero-shot learning strives to classify images into multiple unseen categories for which no data is available during training.
Our work is the first to tackle the problem of multi-label feature in the (generalized) zero-shot setting.
Our cross-level fusion-based generative approach outperforms the state-of-the-art on all three datasets.
arXiv Detail & Related papers (2021-01-27T18:56:46Z) - Universal-to-Specific Framework for Complex Action Recognition [114.78468658086572]
We propose an effective universal-to-specific (U2S) framework for complex action recognition.
The U2S framework is composed of threeworks: a universal network, a category-specific network, and a mask network.
Experiments on a variety of benchmark datasets demonstrate the effectiveness of the U2S framework.
arXiv Detail & Related papers (2020-07-13T01:49:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.