Related papers: SAMIC: Segment Anything with In-Context Spatial Prompt Engineering

SAMIC: Segment Anything with In-Context Spatial Prompt Engineering

URL: http://arxiv.org/abs/2412.11998v1
Date: Mon, 16 Dec 2024 17:26:06 GMT
Title: SAMIC: Segment Anything with In-Context Spatial Prompt Engineering
Authors: Savinay Nagendra, Kashif Rashid, Chaopeng Shen, Daniel Kifer,
Abstract summary: We show how to leverage existing vision foundation models (VFMs) to reduce the incremental cost of creating few-shot segmentation models for new domains.<n>Specifically, we introduce SAMIC, a small network that learns how to prompt VFMs in order to segment new types of objects in domain-specific applications.
Score: 6.900101619562999
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Few-shot segmentation is the problem of learning to identify specific types of objects (e.g., airplanes) in images from a small set of labeled reference images. The current state of the art is driven by resource-intensive construction of models for every new domain-specific application. Such models must be trained on enormous labeled datasets of unrelated objects (e.g., cars, trains, animals) so that their ``knowledge'' can be transferred to new types of objects. In this paper, we show how to leverage existing vision foundation models (VFMs) to reduce the incremental cost of creating few-shot segmentation models for new domains. Specifically, we introduce SAMIC, a small network that learns how to prompt VFMs in order to segment new types of objects in domain-specific applications. SAMIC enables any task to be approached as a few-shot learning problem. At 2.6 million parameters, it is 94% smaller than the leading models (e.g., having ResNet 101 backbone with 45+ million parameters). Even using 1/5th of the training data provided by one-shot benchmarks, SAMIC is competitive with, or sets the state of the art, on a variety of few-shot and semantic segmentation datasets including COCO-$20^i$, Pascal-$5^i$, PerSeg, FSS-1000, and NWPU VHR-10.

Related papers

No time to train! Training-Free Reference-Based Instance Segmentation [15.061599989448867]
This work investigates the task of object segmentation when provided with only a small set of reference images.<n>Our key insight is to leverage strong semantic priors, as learned by foundation models, to identify corresponding regions between a reference and a target image.<n>We find that correspondences enable automatic generation of instance-level segmentation masks for downstream tasks and instantiate our ideas via a multi-stage, training-free method.
arXiv Detail & Related papers (2025-07-03T16:59:01Z)
LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery [45.87124064438554]
We introduce LISAt, a vision-language model designed to describe complex remote-sensing scenes.<n>We trained LISAt on a new curated geospatial reasoning-segmentation dataset, GRES, with 27,615 annotations over 9,205 images.<n> LISAt outperforms state-of-the-art open-domain models on reasoning segmentation tasks by 143.36 % (gIoU)
arXiv Detail & Related papers (2025-05-05T17:56:25Z)
TAVP: Task-Adaptive Visual Prompt for Cross-domain Few-shot Segmentation [40.49924427388922]
We propose a task-adaptive auto-visual prompt framework for Cross-dominan Few-shot segmentation (CD-FSS) We incorporate a Class Domain Task-Adaptive Auto-Prompt (CDTAP) module to enable class-domain feature extraction and generate high-quality, learnable visual prompts. Our model outperforms the state-of-the-art CD-FSS approach, achieving an average accuracy improvement of 1.3% in the 1-shot setting and 11.76% in the 5-shot setting.
arXiv Detail & Related papers (2024-09-09T07:43:58Z)
Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples [61.66967790884943]
Referring video object segmentation (RVOS) relies on sufficient data for a given scene. In more realistic scenarios, only minimal annotations are available for a new scene. We propose a model with a newly designed cross-modal affinity (CMA) module based on a Transformer architecture. CMA module builds multimodal affinity with a few samples, thus quickly learning new semantic information, and enabling the model to adapt to different scenarios.
arXiv Detail & Related papers (2023-09-05T08:34:23Z)
What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation [2.7036595757881323]
We build a benchmark for Multi-domain Evaluation of Semantic (MESS) MESS allows a holistic analysis of performance across a wide range of domain-specific datasets. We evaluate eight recently published models on the proposed MESS benchmark and analyze characteristics for the performance of zero-shot transfer models.
arXiv Detail & Related papers (2023-06-27T14:47:43Z)
Few Shot Semantic Segmentation: a review of methodologies, benchmarks, and open challenges [5.0243930429558885]
Few-Shot Semantic is a novel task in computer vision, which aims at designing models capable of segmenting new semantic classes with only a few examples. This paper consists of a comprehensive survey of Few-Shot Semantic, tracing its evolution and exploring various model designs.
arXiv Detail & Related papers (2023-04-12T13:07:37Z)
Segment Anything [108.16489338211093]
We build the largest segmentation dataset to date, with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive.
arXiv Detail & Related papers (2023-04-05T17:59:46Z)
MSANet: Multi-Similarity and Attention Guidance for Boosting Few-Shot Segmentation [0.0]
Few-shot segmentation aims to segment unseen-class objects given only a handful of densely labeled samples. Prototype learning, where the support feature yields a singleor several prototypes, has been widely used in FSS. We propose a Multi-Similarity and Attention Network (MSANet) including two novel modules, a multi-similarity module and an attention module.
arXiv Detail & Related papers (2022-06-20T09:14:17Z)
Few-Shot Class-Incremental Learning by Sampling Multi-Phase Tasks [59.12108527904171]
A model should recognize new classes and maintain discriminability over old classes. The task of recognizing few-shot new classes without forgetting old classes is called few-shot class-incremental learning (FSCIL) We propose a new paradigm for FSCIL based on meta-learning by LearnIng Multi-phase Incremental Tasks (LIMIT)
arXiv Detail & Related papers (2022-03-31T13:46:41Z)
Novel Class Discovery in Semantic Segmentation [104.30729847367104]
We introduce a new setting of Novel Class Discovery in Semantic (NCDSS) It aims at segmenting unlabeled images containing new classes given prior knowledge from a labeled set of disjoint classes. In NCDSS, we need to distinguish the objects and background, and to handle the existence of multiple classes within an image. We propose the Entropy-based Uncertainty Modeling and Self-training (EUMS) framework to overcome noisy pseudo-labels.
arXiv Detail & Related papers (2021-12-03T13:31:59Z)
Fuzzy Simplicial Networks: A Topology-Inspired Model to Improve Task Generalization in Few-shot Learning [1.0062040918634414]
Few-shot learning algorithms are designed to generalize well to new tasks with limited data. We introduce a new few-shot model called Fuzzy Simplicial Networks (FSN) which leverages a construction from topology to more flexibly represent each class from limited data.
arXiv Detail & Related papers (2020-09-23T17:01:09Z)
Part-aware Prototype Network for Few-shot Semantic Segmentation [50.581647306020095]
We propose a novel few-shot semantic segmentation framework based on the prototype representation. Our key idea is to decompose the holistic class representation into a set of part-aware prototypes. We develop a novel graph neural network model to generate and enhance the proposed part-aware prototypes.
arXiv Detail & Related papers (2020-07-13T11:03:09Z)
Objectness-Aware Few-Shot Semantic Segmentation [31.13009111054977]
We show how to increase overall model capacity to achieve improved performance. We introduce objectness, which is class-agnostic and so not prone to overfitting. Given only one annotated example of an unseen category, experiments show that our method outperforms state-of-art methods with respect to mIoU.
arXiv Detail & Related papers (2020-04-06T19:12:08Z)
CRNet: Cross-Reference Networks for Few-Shot Segmentation [59.85183776573642]
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images. With a cross-reference mechanism, our network can better find the co-occurrent objects in the two images. Experiments on the PASCAL VOC 2012 dataset show that our network achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-03-24T04:55:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.