INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation
- URL: http://arxiv.org/abs/2501.18753v1
- Date: Thu, 30 Jan 2025 21:07:14 GMT
- Title: INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation
- Authors: Jian Hu, Zixu Cheng, Shaogang Gong,
- Abstract summary: We introduce textbfInstance-specific textbfNegative Mining for textbfTask-Generic Promptable (textbfINT)
Int consists of two components: (1) instance-specific prompt generation, which progressively fliters out incorrect information in prompt generation; (2) semantic mask generation, which ensures each image instance segmentation matches correctly the semantics of the instance-specific prompts.
Int is validated on six datasets, including camouflaged objects and medical images, demonstrating its effectiveness, robustness and scalability.
- Score: 31.734740711205227
- License:
- Abstract: Task-generic promptable image segmentation aims to achieve segmentation of diverse samples under a single task description by utilizing only one task-generic prompt. Current methods leverage the generalization capabilities of Vision-Language Models (VLMs) to infer instance-specific prompts from these task-generic prompts in order to guide the segmentation process. However, when VLMs struggle to generalise to some image instances, predicting instance-specific prompts becomes poor. To solve this problem, we introduce \textbf{I}nstance-specific \textbf{N}egative Mining for \textbf{T}ask-Generic Promptable Segmentation (\textbf{INT}). The key idea of INT is to adaptively reduce the influence of irrelevant (negative) prior knowledge whilst to increase the use the most plausible prior knowledge, selected by negative mining with higher contrast, in order to optimise instance-specific prompts generation. Specifically, INT consists of two components: (1) instance-specific prompt generation, which progressively fliters out incorrect information in prompt generation; (2) semantic mask generation, which ensures each image instance segmentation matches correctly the semantics of the instance-specific prompts. INT is validated on six datasets, including camouflaged objects and medical images, demonstrating its effectiveness, robustness and scalability.
Related papers
- Instance-Aware Generalized Referring Expression Segmentation [32.96760407482406]
InstAlign is a method that incorporates object-level reasoning into the segmentation process.
Our method significantly advances state-of-the-art performance, setting a new standard for precise and flexible GRES.
arXiv Detail & Related papers (2024-11-22T17:28:43Z) - LESS: Label-Efficient and Single-Stage Referring 3D Segmentation [55.06002976797879]
Referring 3D is a visual-language task that segments all points of the specified object from a 3D point cloud described by a sentence of query.
We propose a novel Referring 3D pipeline, Label-Efficient and Single-Stage, dubbed LESS, which is only under the supervision of efficient binary mask.
We achieve state-of-the-art performance on ScanRefer dataset by surpassing the previous methods about 3.7% mIoU using only binary labels.
arXiv Detail & Related papers (2024-10-17T07:47:41Z) - Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation [74.04806143723597]
We introduce an iterative Prompt-Mask Cycle generation framework (ProMaC) with a prompt generator and a mask generator.
The prompt generator uses a multi-scale chain of thought prompting, initially exploring hallucinations for extracting extended contextual knowledge on a test image.
The generated masks iteratively induce the prompt generator to focus more on task-relevant image areas and reduce irrelevant hallucinations, resulting jointly in better prompts and masks.
arXiv Detail & Related papers (2024-08-27T17:06:22Z) - Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt
for Segmenting Camouflaged Objects [32.14438610147615]
We introduce a test-time adaptation per-instance mechanism called Generalizable SAM (GenSAM) to automatically enerate and optimize visual prompts.
Experiments on three benchmarks demonstrate that GenSAM outperforms point supervision approaches.
arXiv Detail & Related papers (2023-12-12T15:43:36Z) - Segment (Almost) Nothing: Prompt-Agnostic Adversarial Attacks on
Segmentation Models [61.46999584579775]
General purpose segmentation models are able to generate (semantic) segmentation masks from a variety of prompts.
In particular, input images are pre-processed by an image encoder to obtain embedding vectors which are later used for mask predictions.
We show that even imperceptible perturbations of radius $epsilon=1/255$ are often sufficient to drastically modify the masks predicted with point, box and text prompts.
arXiv Detail & Related papers (2023-11-24T12:57:34Z) - Explicit Visual Prompting for Universal Foreground Segmentations [55.51869354956533]
We present a unified framework for a number of foreground segmentation tasks without any task-specific designs.
We take inspiration from the widely-used pre-training and then prompt tuning protocols in NLP.
Our method freezes a pre-trained model and then learns task-specific knowledge using a few extra parameters.
arXiv Detail & Related papers (2023-05-29T11:05:01Z) - Explicit Visual Prompting for Low-Level Structure Segmentations [55.51869354956533]
We propose a new visual prompting model, named Explicit Visual Prompting (EVP)
EVP significantly outperforms other parameter-efficient tuning protocols under the same amount of tunable parameters.
EVP also achieves state-of-the-art performances on diverse low-level structure segmentation tasks.
arXiv Detail & Related papers (2023-03-20T06:01:53Z) - Target-Aware Object Discovery and Association for Unsupervised Video
Multi-Object Segmentation [79.6596425920849]
This paper addresses the task of unsupervised video multi-object segmentation.
We introduce a novel approach for more accurate and efficient unseen-temporal segmentation.
We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
arXiv Detail & Related papers (2021-04-10T14:39:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.