Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts
- URL: http://arxiv.org/abs/2407.02075v3
- Date: Fri, 25 Jul 2025 13:01:46 GMT
- Title: Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts
- Authors: Pasquale De Marinis, Nicola Fanelli, Raffaele Scaringi, Emanuele Colonna, Giuseppe Fiameni, Gennaro Vessio, Giovanna Castellano,
- Abstract summary: Few-shot semantic segmentation aims to segment objects from previously unseen classes using only a limited number of labeled examples.<n>We introduce Label Anything, a novel transformer-based architecture designed for multi-prompt, multi-way few-shot semantic segmentation.
- Score: 10.262029691744921
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Few-shot semantic segmentation aims to segment objects from previously unseen classes using only a limited number of labeled examples. In this paper, we introduce Label Anything, a novel transformer-based architecture designed for multi-prompt, multi-way few-shot semantic segmentation. Our approach leverages diverse visual prompts -- points, bounding boxes, and masks -- to create a highly flexible and generalizable framework that significantly reduces annotation burden while maintaining high accuracy. Label Anything makes three key contributions: ($\textit{i}$) we introduce a new task formulation that relaxes conventional few-shot segmentation constraints by supporting various types of prompts, multi-class classification, and enabling multiple prompts within a single image; ($\textit{ii}$) we propose a novel architecture based on transformers and attention mechanisms; and ($\textit{iii}$) we design a versatile training procedure allowing our model to operate seamlessly across different $N$-way $K$-shot and prompt-type configurations with a single trained model. Our extensive experimental evaluation on the widely used COCO-$20^i$ benchmark demonstrates that Label Anything achieves state-of-the-art performance among existing multi-way few-shot segmentation methods, while significantly outperforming leading single-class models when evaluated in multi-class settings. Code and trained models are available at https://github.com/pasqualedem/LabelAnything.
Related papers
- DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation [2.7624021966289605]
Few-shot semantic segmentation (FSS) aims to enable models to segment novel/unseen object classes using only a limited number of labeled examples.
We propose a novel framework that utilizes large language models (LLMs) to adapt general class semantic information to the query image.
Our framework achieves state-of-the-art performance-by a significant margin-demonstrating superior generalization to novel classes and robustness across diverse scenarios.
arXiv Detail & Related papers (2025-03-06T01:42:28Z) - Class-Independent Increment: An Efficient Approach for Multi-label Class-Incremental Learning [49.65841002338575]
This paper focuses on the challenging yet practical multi-label class-incremental learning (MLCIL) problem.
We propose a novel class-independent incremental network (CINet) to extract multiple class-level embeddings for multi-label samples.
It learns and preserves the knowledge of different classes by constructing class-specific tokens.
arXiv Detail & Related papers (2025-03-01T14:40:52Z) - LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging [65.72891334156706]
We introduce Label-Combination Prototypical Networks (LC-Protonets) to address the problem of multi-label few-shot classification.<n> LC-Protonets generate one prototype per label combination, derived from the power set of labels present in the limited training items.<n>Our method is applied to automatic audio tagging across diverse music datasets, covering various cultures and including both modern and traditional music.
arXiv Detail & Related papers (2024-09-17T15:13:07Z) - Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation [67.35274834837064]
We develop a universal vision-language framework (UniFSS) to integrate prompts from text, mask, box, and image.
UniFSS significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-07-16T08:41:01Z) - CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing [66.6712018832575]
Domain generalization (DG) based Face Anti-Spoofing (FAS) aims to improve the model's performance on unseen domains.
We make use of large-scale VLMs like CLIP and leverage the textual feature to dynamically adjust the classifier's weights for exploring generalizable visual features.
arXiv Detail & Related papers (2024-03-21T11:58:50Z) - In-Context Learning for Extreme Multi-Label Classification [29.627891261947536]
Multi-label classification problems with thousands of classes are hard to solve with in-context learning alone.
We propose a general program that defines multi-step interactions between LMs and retrievers to efficiently tackle such problems.
Our solution requires no finetuning, is easily applicable to new tasks, alleviates prompt engineering, and requires only tens of labeled examples.
arXiv Detail & Related papers (2024-01-22T18:09:52Z) - Masked Cross-image Encoding for Few-shot Segmentation [16.445813548503708]
Few-shot segmentation (FSS) is a dense prediction task that aims to infer the pixel-wise labels of unseen classes using only a limited number of annotated images.
We propose a joint learning method termed Masked Cross-Image MCE, which is designed to capture common visual properties that describe object details and to learn bidirectional inter-image dependencies that enhance feature interaction.
arXiv Detail & Related papers (2023-08-22T05:36:39Z) - Visual and Textual Prior Guided Mask Assemble for Few-Shot Segmentation
and Beyond [0.0]
We propose a visual and textual Prior Guided Mask Assemble Network (PGMA-Net)
It employs a class-agnostic mask assembly process to alleviate the bias, and formulates diverse tasks into a unified manner by assembling the prior through affinity.
It achieves new state-of-the-art results in the FSS task, with mIoU of $77.6$ on $textPASCAL-5i$ and $59.4$ on $textCOCO-20i$ in 1-shot scenario.
arXiv Detail & Related papers (2023-08-15T02:46:49Z) - Learning from Pseudo-labeled Segmentation for Multi-Class Object
Counting [35.652092907690694]
Class-agnostic counting (CAC) has numerous potential applications across various domains.
The goal is to count objects of an arbitrary category during testing, based on only a few annotated exemplars.
We show that the segmentation model trained on these pseudo-labeled masks can effectively localize objects of interest for an arbitrary multi-class image.
arXiv Detail & Related papers (2023-07-15T01:33:19Z) - Reliable Representations Learning for Incomplete Multi-View Partial Multi-Label Classification [78.15629210659516]
In this paper, we propose an incomplete multi-view partial multi-label classification network named RANK.
We break through the view-level weights inherent in existing methods and propose a quality-aware sub-network to dynamically assign quality scores to each view of each sample.
Our model is not only able to handle complete multi-view multi-label datasets, but also works on datasets with missing instances and labels.
arXiv Detail & Related papers (2023-03-30T03:09:25Z) - APANet: Adaptive Prototypes Alignment Network for Few-Shot Semantic
Segmentation [56.387647750094466]
Few-shot semantic segmentation aims to segment novel-class objects in a given query image with only a few labeled support images.
Most advanced solutions exploit a metric learning framework that performs segmentation through matching each query feature to a learned class-specific prototype.
We present an adaptive prototype representation by introducing class-specific and class-agnostic prototypes.
arXiv Detail & Related papers (2021-11-24T04:38:37Z) - One-Class Meta-Learning: Towards Generalizable Few-Shot Open-Set
Classification [2.28438857884398]
We introduce two independent few-shot one-class classification methods: Meta Binary Cross-Entropy (Meta-BCE) and One-Class Meta-Learning (OCML)
Both methods can augment any existing few-shot learning method without requiring retraining to work in a few-shot multiclass open-set setting without degrading its closed-set performance.
They surpass the state-of-the-art methods in the few-shot multiclass open-set and few-shot one-class tasks.
arXiv Detail & Related papers (2021-09-14T17:52:51Z) - Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight
Transformer [112.95747173442754]
A few-shot semantic segmentation model is typically composed of a CNN encoder, a CNN decoder and a simple classifier.
Most existing methods meta-learn all three model components for fast adaptation to a new class.
In this work we propose to simplify the meta-learning task by focusing solely on the simplest component, the classifier.
arXiv Detail & Related papers (2021-08-06T10:20:08Z) - Generative Multi-Label Zero-Shot Learning [136.17594611722285]
Multi-label zero-shot learning strives to classify images into multiple unseen categories for which no data is available during training.
Our work is the first to tackle the problem of multi-label feature in the (generalized) zero-shot setting.
Our cross-level fusion-based generative approach outperforms the state-of-the-art on all three datasets.
arXiv Detail & Related papers (2021-01-27T18:56:46Z) - Part-aware Prototype Network for Few-shot Semantic Segmentation [50.581647306020095]
We propose a novel few-shot semantic segmentation framework based on the prototype representation.
Our key idea is to decompose the holistic class representation into a set of part-aware prototypes.
We develop a novel graph neural network model to generate and enhance the proposed part-aware prototypes.
arXiv Detail & Related papers (2020-07-13T11:03:09Z) - Universal-to-Specific Framework for Complex Action Recognition [114.78468658086572]
We propose an effective universal-to-specific (U2S) framework for complex action recognition.
The U2S framework is composed of threeworks: a universal network, a category-specific network, and a mask network.
Experiments on a variety of benchmark datasets demonstrate the effectiveness of the U2S framework.
arXiv Detail & Related papers (2020-07-13T01:49:07Z) - Few-shot 3D Point Cloud Semantic Segmentation [138.80825169240302]
We propose a novel attention-aware multi-prototype transductive few-shot point cloud semantic segmentation method.
Our proposed method shows significant and consistent improvements compared to baselines in different few-shot point cloud semantic segmentation settings.
arXiv Detail & Related papers (2020-06-22T08:05:25Z) - UniT: Unified Knowledge Transfer for Any-shot Object Detection and
Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training.
We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.