Language-guided Few-shot Semantic Segmentation
- URL: http://arxiv.org/abs/2311.13865v1
- Date: Thu, 23 Nov 2023 09:08:49 GMT
- Title: Language-guided Few-shot Semantic Segmentation
- Authors: Jing Wang, Yuang Liu, Qiang Zhou, Fan Wang
- Abstract summary: We propose an innovative solution to tackle the challenge of few-shot semantic segmentation using only language information.
Our approach involves a vision-language-driven mask distillation scheme, which generates high quality pseudo-semantic masks from text prompts.
Experiments on two benchmark datasets demonstrate that our method establishes a new baseline for language-guided few-shot semantic segmentation.
- Score: 23.46604057006498
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Few-shot learning is a promising way for reducing the label cost in new
categories adaptation with the guidance of a small, well labeled support set.
But for few-shot semantic segmentation, the pixel-level annotations of support
images are still expensive. In this paper, we propose an innovative solution to
tackle the challenge of few-shot semantic segmentation using only language
information, i.e.image-level text labels. Our approach involves a
vision-language-driven mask distillation scheme, which contains a
vision-language pretraining (VLP) model and a mask refiner, to generate high
quality pseudo-semantic masks from text prompts. We additionally introduce a
distributed prototype supervision method and complementary correlation matching
module to guide the model in digging precise semantic relations among support
and query images. The experiments on two benchmark datasets demonstrate that
our method establishes a new baseline for language-guided few-shot semantic
segmentation and achieves competitive results to recent vision-guided methods.
Related papers
- CPSeg: Finer-grained Image Semantic Segmentation via Chain-of-Thought
Language Prompting [8.12405696290333]
CPSeg is a framework designed to augment image segmentation performance by integrating a novel "Chain-of-Thought" process.
We propose a new vision-language dataset, FloodPrompt, which includes images, semantic masks, and corresponding text information.
arXiv Detail & Related papers (2023-10-24T13:32:32Z) - Shatter and Gather: Learning Referring Image Segmentation with Text
Supervision [52.46081425504072]
We present a new model that discovers semantic entities in input image and then combines such entities relevant to text query to predict the mask of the referent.
Our method was evaluated on four public benchmarks for referring image segmentation, where it clearly outperformed the existing method for the same task and recent open-vocabulary segmentation models on all the benchmarks.
arXiv Detail & Related papers (2023-08-29T15:39:15Z) - MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner
for Open-World Semantic Segmentation [110.09800389100599]
We propose MixReorg, a novel and straightforward pre-training paradigm for semantic segmentation.
Our approach involves generating fine-grained patch-text pairs data by mixing image patches while preserving the correspondence between patches and text.
With MixReorg as a mask learner, conventional text-supervised semantic segmentation models can achieve highly generalizable pixel-semantic alignment ability.
arXiv Detail & Related papers (2023-08-09T09:35:16Z) - LPN: Language-guided Prototypical Network for few-shot classification [16.37959398470535]
Few-shot classification aims to adapt to new tasks with limited labeled examples.
Recent methods explore suitable measures for the similarity between the query and support images.
We propose a Language-guided Prototypical Network (LPN) for few-shot classification.
arXiv Detail & Related papers (2023-07-04T06:54:01Z) - Exploring Open-Vocabulary Semantic Segmentation without Human Labels [76.15862573035565]
We present ZeroSeg, a novel method that leverages the existing pretrained vision-language model (VL) to train semantic segmentation models.
ZeroSeg overcomes this by distilling the visual concepts learned by VL models into a set of segment tokens, each summarizing a localized region of the target image.
Our approach achieves state-of-the-art performance when compared to other zero-shot segmentation methods under the same training data.
arXiv Detail & Related papers (2023-06-01T08:47:06Z) - ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View
Semantic Consistency [126.88107868670767]
We propose multi-textbfView textbfConsistent learning (ViewCo) for text-supervised semantic segmentation.
We first propose text-to-views consistency modeling to learn correspondence for multiple views of the same input image.
We also propose cross-view segmentation consistency modeling to address the ambiguity issue of text supervision.
arXiv Detail & Related papers (2023-01-31T01:57:52Z) - Single-Stream Multi-Level Alignment for Vision-Language Pretraining [103.09776737512078]
We propose a single stream model that aligns the modalities at multiple levels.
We achieve this using two novel tasks: symmetric cross-modality reconstruction and a pseudo-labeled key word prediction.
We demonstrate top performance on a set of Vision-Language downstream tasks such as zero-shot/fine-tuned image/text retrieval, referring expression, and VQA.
arXiv Detail & Related papers (2022-03-27T21:16:10Z) - Incremental Learning in Semantic Segmentation from Image Labels [18.404068463921426]
Existing semantic segmentation approaches achieve impressive results, but struggle to update their models incrementally as new categories are uncovered.
This paper proposes a novel framework for Weakly Incremental Learning for Semantics, that aims at learning to segment new classes from cheap and largely available image-level labels.
As opposed to existing approaches, that need to generate pseudo-labels offline, we use an auxiliary classifier, trained with image-level labels and regularized by the segmentation model, to obtain pseudo-supervision online and update the model incrementally.
arXiv Detail & Related papers (2021-12-03T12:47:12Z) - A Pixel-Level Meta-Learner for Weakly Supervised Few-Shot Semantic
Segmentation [40.27705176115985]
Few-shot semantic segmentation addresses the learning task in which only few images with ground truth pixel-level labels are available for the novel classes of interest.
We propose a novel meta-learning framework, which predicts pseudo pixel-level segmentation masks from a limited amount of data and their semantic labels.
Our proposed learning model can be viewed as a pixel-level meta-learner.
arXiv Detail & Related papers (2021-11-02T08:28:11Z) - Part-aware Prototype Network for Few-shot Semantic Segmentation [50.581647306020095]
We propose a novel few-shot semantic segmentation framework based on the prototype representation.
Our key idea is to decompose the holistic class representation into a set of part-aware prototypes.
We develop a novel graph neural network model to generate and enhance the proposed part-aware prototypes.
arXiv Detail & Related papers (2020-07-13T11:03:09Z) - Few-Shot Semantic Segmentation Augmented with Image-Level Weak
Annotations [23.02986307143718]
Recent progress in fewshot semantic segmentation tackles the issue by only a few pixel-level annotated examples.
Our key idea is to learn a better prototype representation of the class by fusing the knowledge from the image-level labeled data.
We propose a new framework, called PAIA, to learn the class prototype representation in a metric space by integrating image-level annotations.
arXiv Detail & Related papers (2020-07-03T04:58:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.