Visual Prompt Selection for In-Context Learning Segmentation
- URL: http://arxiv.org/abs/2407.10233v1
- Date: Sun, 14 Jul 2024 15:02:54 GMT
- Title: Visual Prompt Selection for In-Context Learning Segmentation
- Authors: Wei Suo, Lanqing Lai, Mengyang Sun, Hanwang Zhang, Peng Wang, Yanning Zhang,
- Abstract summary: In this paper, we focus on rethinking and improving the example selection strategy.
We first demonstrate that ICL-based segmentation models are sensitive to different contexts.
Furthermore, empirical evidence indicates that the diversity of contextual prompts plays a crucial role in guiding segmentation.
- Score: 77.15684360470152
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As a fundamental and extensively studied task in computer vision, image segmentation aims to locate and identify different semantic concepts at the pixel level. Recently, inspired by In-Context Learning (ICL), several generalist segmentation frameworks have been proposed, providing a promising paradigm for segmenting specific objects. However, existing works mostly ignore the value of visual prompts or simply apply similarity sorting to select contextual examples. In this paper, we focus on rethinking and improving the example selection strategy. By comprehensive comparisons, we first demonstrate that ICL-based segmentation models are sensitive to different contexts. Furthermore, empirical evidence indicates that the diversity of contextual prompts plays a crucial role in guiding segmentation. Based on the above insights, we propose a new stepwise context search method. Different from previous works, we construct a small yet rich candidate pool and adaptively search the well-matched contexts. More importantly, this method effectively reduces the annotation cost by compacting the search space. Extensive experiments show that our method is an effective strategy for selecting examples and enhancing segmentation performance.
Related papers
- A Simple Image Segmentation Framework via In-Context Examples [59.319920526160466]
We present SINE, a simple image framework utilizing in-context examples.
We introduce an In-context Interaction module to complement in-context information and produce correlations between the target image and the in-context example.
Experiments on various segmentation tasks show the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-10-07T08:59:05Z) - A Bottom-Up Approach to Class-Agnostic Image Segmentation [4.086366531569003]
We present a novel bottom-up formulation for addressing the class-agnostic segmentation problem.
We supervise our network directly on the projective sphere of its feature space.
Our bottom-up formulation exhibits exceptional generalization capability, even when trained on datasets designed for class-based segmentation.
arXiv Detail & Related papers (2024-09-20T17:56:02Z) - Leveraging Open-Vocabulary Diffusion to Camouflaged Instance
Segmentation [59.78520153338878]
Text-to-image diffusion techniques have shown exceptional capability of producing high-quality images from text descriptions.
We propose a method built upon a state-of-the-art diffusion model, empowered by open-vocabulary to learn multi-scale textual-visual features for camouflaged object representations.
arXiv Detail & Related papers (2023-12-29T07:59:07Z) - SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation [87.18373801829314]
In-context segmentation aims at segmenting novel images using a few labeled example images, termed as "in-context examples"
We propose SEGIC, an end-to-end segment-in-context framework built upon a single vision foundation model (VFM)
SEGIC is a straightforward yet effective approach that yields state-of-the-art performance on one-shot segmentation benchmarks.
arXiv Detail & Related papers (2023-11-24T18:59:42Z) - PartSeg: Few-shot Part Segmentation via Part-aware Prompt Learning [44.48704588318053]
We develop a novel method termed PartSeg for few-shot part segmentation based on multimodal learning.
We conduct extensive experiments on the PartImageNet and Pascal$_$Part datasets.
arXiv Detail & Related papers (2023-08-24T13:03:42Z) - FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation [42.89720785573885]
FreeSeg is a generic framework to accomplish Unified, Universal and Open-Vocabulary Image.
We show that FreeSeg establishes new state-of-the-art results in performance and generalization on three segmentation tasks.
arXiv Detail & Related papers (2023-03-30T08:42:49Z) - CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation [56.58365347854647]
We introduce a novel cost-based approach to adapt vision-language foundation models, notably CLIP.
Our method potently adapts CLIP for segmenting seen and unseen classes by fine-tuning its encoders.
arXiv Detail & Related papers (2023-03-21T12:28:21Z) - Open-world Semantic Segmentation via Contrasting and Clustering
Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations.
Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z) - Beyond the Deep Metric Learning: Enhance the Cross-Modal Matching with
Adversarial Discriminative Domain Regularization [21.904563910555368]
We propose a novel learning framework to construct a set of discriminative data domains within each image-text pairs.
Our approach can generally improve the learning efficiency and the performance of existing metrics learning frameworks.
arXiv Detail & Related papers (2020-10-23T01:48:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.