SAMIC: Segment Anything with In-Context Spatial Prompt Engineering
- URL: http://arxiv.org/abs/2412.11998v1
- Date: Mon, 16 Dec 2024 17:26:06 GMT
- Title: SAMIC: Segment Anything with In-Context Spatial Prompt Engineering
- Authors: Savinay Nagendra, Kashif Rashid, Chaopeng Shen, Daniel Kifer,
- Abstract summary: We show how to leverage existing vision foundation models (VFMs) to reduce the incremental cost of creating few-shot segmentation models for new domains.<n>Specifically, we introduce SAMIC, a small network that learns how to prompt VFMs in order to segment new types of objects in domain-specific applications.
- Score: 6.900101619562999
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Few-shot segmentation is the problem of learning to identify specific types of objects (e.g., airplanes) in images from a small set of labeled reference images. The current state of the art is driven by resource-intensive construction of models for every new domain-specific application. Such models must be trained on enormous labeled datasets of unrelated objects (e.g., cars, trains, animals) so that their ``knowledge'' can be transferred to new types of objects. In this paper, we show how to leverage existing vision foundation models (VFMs) to reduce the incremental cost of creating few-shot segmentation models for new domains. Specifically, we introduce SAMIC, a small network that learns how to prompt VFMs in order to segment new types of objects in domain-specific applications. SAMIC enables any task to be approached as a few-shot learning problem. At 2.6 million parameters, it is 94% smaller than the leading models (e.g., having ResNet 101 backbone with 45+ million parameters). Even using 1/5th of the training data provided by one-shot benchmarks, SAMIC is competitive with, or sets the state of the art, on a variety of few-shot and semantic segmentation datasets including COCO-$20^i$, Pascal-$5^i$, PerSeg, FSS-1000, and NWPU VHR-10.
Related papers
- No time to train! Training-Free Reference-Based Instance Segmentation [15.061599989448867]
This work investigates the task of object segmentation when provided with only a small set of reference images.<n>Our key insight is to leverage strong semantic priors, as learned by foundation models, to identify corresponding regions between a reference and a target image.<n>We find that correspondences enable automatic generation of instance-level segmentation masks for downstream tasks and instantiate our ideas via a multi-stage, training-free method.
arXiv Detail & Related papers (2025-07-03T16:59:01Z) - LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery [45.87124064438554]
We introduce LISAt, a vision-language model designed to describe complex remote-sensing scenes.<n>We trained LISAt on a new curated geospatial reasoning-segmentation dataset, GRES, with 27,615 annotations over 9,205 images.<n> LISAt outperforms state-of-the-art open-domain models on reasoning segmentation tasks by 143.36 % (gIoU)
arXiv Detail & Related papers (2025-05-05T17:56:25Z) - TAVP: Task-Adaptive Visual Prompt for Cross-domain Few-shot Segmentation [40.49924427388922]
We propose a task-adaptive auto-visual prompt framework for Cross-dominan Few-shot segmentation (CD-FSS)
We incorporate a Class Domain Task-Adaptive Auto-Prompt (CDTAP) module to enable class-domain feature extraction and generate high-quality, learnable visual prompts.
Our model outperforms the state-of-the-art CD-FSS approach, achieving an average accuracy improvement of 1.3% in the 1-shot setting and 11.76% in the 5-shot setting.
arXiv Detail & Related papers (2024-09-09T07:43:58Z) - Learning Cross-Modal Affinity for Referring Video Object Segmentation
Targeting Limited Samples [61.66967790884943]
Referring video object segmentation (RVOS) relies on sufficient data for a given scene.
In more realistic scenarios, only minimal annotations are available for a new scene.
We propose a model with a newly designed cross-modal affinity (CMA) module based on a Transformer architecture.
CMA module builds multimodal affinity with a few samples, thus quickly learning new semantic information, and enabling the model to adapt to different scenarios.
arXiv Detail & Related papers (2023-09-05T08:34:23Z) - What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation [2.7036595757881323]
We build a benchmark for Multi-domain Evaluation of Semantic (MESS)
MESS allows a holistic analysis of performance across a wide range of domain-specific datasets.
We evaluate eight recently published models on the proposed MESS benchmark and analyze characteristics for the performance of zero-shot transfer models.
arXiv Detail & Related papers (2023-06-27T14:47:43Z) - Few Shot Semantic Segmentation: a review of methodologies, benchmarks, and open challenges [5.0243930429558885]
Few-Shot Semantic is a novel task in computer vision, which aims at designing models capable of segmenting new semantic classes with only a few examples.
This paper consists of a comprehensive survey of Few-Shot Semantic, tracing its evolution and exploring various model designs.
arXiv Detail & Related papers (2023-04-12T13:07:37Z) - Segment Anything [108.16489338211093]
We build the largest segmentation dataset to date, with over 1 billion masks on 11M licensed and privacy respecting images.
The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks.
We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive.
arXiv Detail & Related papers (2023-04-05T17:59:46Z) - MSANet: Multi-Similarity and Attention Guidance for Boosting Few-Shot
Segmentation [0.0]
Few-shot segmentation aims to segment unseen-class objects given only a handful of densely labeled samples.
Prototype learning, where the support feature yields a singleor several prototypes, has been widely used in FSS.
We propose a Multi-Similarity and Attention Network (MSANet) including two novel modules, a multi-similarity module and an attention module.
arXiv Detail & Related papers (2022-06-20T09:14:17Z) - Few-Shot Class-Incremental Learning by Sampling Multi-Phase Tasks [59.12108527904171]
A model should recognize new classes and maintain discriminability over old classes.
The task of recognizing few-shot new classes without forgetting old classes is called few-shot class-incremental learning (FSCIL)
We propose a new paradigm for FSCIL based on meta-learning by LearnIng Multi-phase Incremental Tasks (LIMIT)
arXiv Detail & Related papers (2022-03-31T13:46:41Z) - Novel Class Discovery in Semantic Segmentation [104.30729847367104]
We introduce a new setting of Novel Class Discovery in Semantic (NCDSS)
It aims at segmenting unlabeled images containing new classes given prior knowledge from a labeled set of disjoint classes.
In NCDSS, we need to distinguish the objects and background, and to handle the existence of multiple classes within an image.
We propose the Entropy-based Uncertainty Modeling and Self-training (EUMS) framework to overcome noisy pseudo-labels.
arXiv Detail & Related papers (2021-12-03T13:31:59Z) - Fuzzy Simplicial Networks: A Topology-Inspired Model to Improve Task
Generalization in Few-shot Learning [1.0062040918634414]
Few-shot learning algorithms are designed to generalize well to new tasks with limited data.
We introduce a new few-shot model called Fuzzy Simplicial Networks (FSN) which leverages a construction from topology to more flexibly represent each class from limited data.
arXiv Detail & Related papers (2020-09-23T17:01:09Z) - Part-aware Prototype Network for Few-shot Semantic Segmentation [50.581647306020095]
We propose a novel few-shot semantic segmentation framework based on the prototype representation.
Our key idea is to decompose the holistic class representation into a set of part-aware prototypes.
We develop a novel graph neural network model to generate and enhance the proposed part-aware prototypes.
arXiv Detail & Related papers (2020-07-13T11:03:09Z) - Objectness-Aware Few-Shot Semantic Segmentation [31.13009111054977]
We show how to increase overall model capacity to achieve improved performance.
We introduce objectness, which is class-agnostic and so not prone to overfitting.
Given only one annotated example of an unseen category, experiments show that our method outperforms state-of-art methods with respect to mIoU.
arXiv Detail & Related papers (2020-04-06T19:12:08Z) - CRNet: Cross-Reference Networks for Few-Shot Segmentation [59.85183776573642]
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images.
With a cross-reference mechanism, our network can better find the co-occurrent objects in the two images.
Experiments on the PASCAL VOC 2012 dataset show that our network achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-03-24T04:55:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.