No time to train! Training-Free Reference-Based Instance Segmentation
- URL: http://arxiv.org/abs/2507.02798v2
- Date: Sat, 05 Jul 2025 08:15:07 GMT
- Title: No time to train! Training-Free Reference-Based Instance Segmentation
- Authors: Miguel Espinosa, Chenhongyi Yang, Linus Ericsson, Steven McDonagh, Elliot J. Crowley,
- Abstract summary: This work investigates the task of object segmentation when provided with only a small set of reference images.<n>Our key insight is to leverage strong semantic priors, as learned by foundation models, to identify corresponding regions between a reference and a target image.<n>We find that correspondences enable automatic generation of instance-level segmentation masks for downstream tasks and instantiate our ideas via a multi-stage, training-free method.
- Score: 15.061599989448867
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The performance of image segmentation models has historically been constrained by the high cost of collecting large-scale annotated data. The Segment Anything Model (SAM) alleviates this original problem through a promptable, semantics-agnostic, segmentation paradigm and yet still requires manual visual-prompts or complex domain-dependent prompt-generation rules to process a new image. Towards reducing this new burden, our work investigates the task of object segmentation when provided with, alternatively, only a small set of reference images. Our key insight is to leverage strong semantic priors, as learned by foundation models, to identify corresponding regions between a reference and a target image. We find that correspondences enable automatic generation of instance-level segmentation masks for downstream tasks and instantiate our ideas via a multi-stage, training-free method incorporating (1) memory bank construction; (2) representation aggregation and (3) semantic-aware feature matching. Our experiments show significant improvements on segmentation metrics, leading to state-of-the-art performance on COCO FSOD (36.8% nAP), PASCAL VOC Few-Shot (71.2% nAP50) and outperforming existing training-free approaches on the Cross-Domain FSOD benchmark (22.4% nAP).
Related papers
- One-shot In-context Part Segmentation [97.77292483684877]
We present the One-shot In-context Part (OIParts) framework to tackle the challenges of part segmentation.<n>Our framework offers a novel approach to part segmentation that is training-free, flexible, and data-efficient.<n>We have achieved remarkable segmentation performance across diverse object categories.
arXiv Detail & Related papers (2025-03-03T03:50:54Z) - SketchYourSeg: Mask-Free Subjective Image Segmentation via Freehand Sketches [116.1810651297801]
SketchYourSeg establishes freehand sketches as a powerful query modality for subjective image segmentation.<n>Our evaluations demonstrate superior performance over existing approaches across diverse benchmarks.
arXiv Detail & Related papers (2025-01-27T13:07:51Z) - Test-Time Optimization for Domain Adaptive Open Vocabulary Segmentation [15.941958367737408]
Seg-TTO is a framework for zero-shot, open-vocabulary semantic segmentation.<n>We focus on segmentation-specific test-time optimization to address this gap.<n>Seg-TTO demonstrates clear performance improvements (up to 27% mIoU increase on some datasets) establishing new state-of-the-art.
arXiv Detail & Related papers (2025-01-08T18:58:24Z) - Boosting Few-Shot Segmentation via Instance-Aware Data Augmentation and
Local Consensus Guided Cross Attention [7.939095881813804]
Few-shot segmentation aims to train a segmentation model that can fast adapt to a novel task for which only a few annotated images are provided.
We introduce an instance-aware data augmentation (IDA) strategy that augments the support images based on the relative sizes of the target objects.
The proposed IDA effectively increases the support set's diversity and promotes the distribution consistency between support and query images.
arXiv Detail & Related papers (2024-01-18T10:29:10Z) - Exploring Open-Vocabulary Semantic Segmentation without Human Labels [76.15862573035565]
We present ZeroSeg, a novel method that leverages the existing pretrained vision-language model (VL) to train semantic segmentation models.
ZeroSeg overcomes this by distilling the visual concepts learned by VL models into a set of segment tokens, each summarizing a localized region of the target image.
Our approach achieves state-of-the-art performance when compared to other zero-shot segmentation methods under the same training data.
arXiv Detail & Related papers (2023-06-01T08:47:06Z) - AIMS: All-Inclusive Multi-Level Segmentation [93.5041381700744]
We propose a new task, All-Inclusive Multi-Level (AIMS), which segments visual regions into three levels: part, entity, and relation.
We also build a unified AIMS model through multi-dataset multi-task training to address the two major challenges of annotation inconsistency and task correlation.
arXiv Detail & Related papers (2023-05-28T16:28:49Z) - Instance Segmentation of Unlabeled Modalities via Cyclic Segmentation
GAN [27.936725483892076]
We propose a novel Cyclic Generative Adrial Network (CySGAN) that conducts image translation and instance segmentation jointly.
We benchmark our approach on the task of 3D neuronal nuclei segmentation with annotated electron microscopy (EM) images and unlabeled expansion microscopy (ExM) data.
arXiv Detail & Related papers (2022-04-06T20:46:39Z) - A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained
Vision-language Model [61.58071099082296]
It is unclear how to make zero-shot recognition working well on broader vision problems, such as object detection and semantic segmentation.
In this paper, we target for zero-shot semantic segmentation, by building it on an off-the-shelf pre-trained vision-language model, i.e., CLIP.
Our experimental results show that this simple framework surpasses previous state-of-the-arts by a large margin.
arXiv Detail & Related papers (2021-12-29T18:56:18Z) - Boosting Few-shot Semantic Segmentation with Transformers [81.43459055197435]
TRansformer-based Few-shot Semantic segmentation method (TRFS)
Our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM)
arXiv Detail & Related papers (2021-08-04T20:09:21Z) - SCNet: Enhancing Few-Shot Semantic Segmentation by Self-Contrastive
Background Prototypes [56.387647750094466]
Few-shot semantic segmentation aims to segment novel-class objects in a query image with only a few annotated examples.
Most of advanced solutions exploit a metric learning framework that performs segmentation through matching each pixel to a learned foreground prototype.
This framework suffers from biased classification due to incomplete construction of sample pairs with the foreground prototype only.
arXiv Detail & Related papers (2021-04-19T11:21:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.