Related papers: INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation

INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation

URL: http://arxiv.org/abs/2501.18753v1
Date: Thu, 30 Jan 2025 21:07:14 GMT
Title: INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation
Authors: Jian Hu, Zixu Cheng, Shaogang Gong,
Abstract summary: We introduce textbfInstance-specific textbfNegative Mining for textbfTask-Generic Promptable (textbfINT)<n>Int consists of two components: (1) instance-specific prompt generation, which progressively fliters out incorrect information in prompt generation; (2) semantic mask generation, which ensures each image instance segmentation matches correctly the semantics of the instance-specific prompts.<n>Int is validated on six datasets, including camouflaged objects and medical images, demonstrating its effectiveness, robustness and scalability.
Score: 31.734740711205227
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Task-generic promptable image segmentation aims to achieve segmentation of diverse samples under a single task description by utilizing only one task-generic prompt. Current methods leverage the generalization capabilities of Vision-Language Models (VLMs) to infer instance-specific prompts from these task-generic prompts in order to guide the segmentation process. However, when VLMs struggle to generalise to some image instances, predicting instance-specific prompts becomes poor. To solve this problem, we introduce \textbf{I}nstance-specific \textbf{N}egative Mining for \textbf{T}ask-Generic Promptable Segmentation (\textbf{INT}). The key idea of INT is to adaptively reduce the influence of irrelevant (negative) prior knowledge whilst to increase the use the most plausible prior knowledge, selected by negative mining with higher contrast, in order to optimise instance-specific prompts generation. Specifically, INT consists of two components: (1) instance-specific prompt generation, which progressively fliters out incorrect information in prompt generation; (2) semantic mask generation, which ensures each image instance segmentation matches correctly the semantics of the instance-specific prompts. INT is validated on six datasets, including camouflaged objects and medical images, demonstrating its effectiveness, robustness and scalability.

Related papers

Stepwise Decomposition and Dual-stream Focus: A Novel Approach for Training-free Camouflaged Object Segmentation [9.862714096455175]
We propose a novel training-free test-time adaptation framework that synergizes textbfRegion-constrained textbfDual-stream textbfVisual textbfPrompting (RDVP) via textbfMultimodal textbfStepwise textbfDecomposition Chain of Thought (MSD-CoT)<n>RDVP injects spatial constraints into visual and independently samples visual prompts for foreground and background points, effectively mitigating semantic discrepancy and
arXiv Detail & Related papers (2025-06-07T14:50:26Z)
Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts [64.93416171745693]
ThinkFirst is a training-free reasoning segmentation framework. Our approach allows GPT-4o or other powerful MLLMs to generate a detailed, chain-of-thought description of an image. This summarized description is then passed to a language-instructed segmentation assistant to aid the segmentation process.
arXiv Detail & Related papers (2025-03-10T16:26:11Z)
Instance-Aware Generalized Referring Expression Segmentation [32.96760407482406]
InstAlign is a method that incorporates object-level reasoning into the segmentation process. Our method significantly advances state-of-the-art performance, setting a new standard for precise and flexible GRES.
arXiv Detail & Related papers (2024-11-22T17:28:43Z)
Insight Any Instance: Promptable Instance Segmentation for Remote Sensing Images [0.0]
Instance segmentation of remote sensing images (RSIs) is an essential task for a wide range of applications such as land planning and intelligent transport. Most of the instance segmentation models are based on deep feature learning and contain operations such as multiple downsampling. Inspired by the recent superior performance of prompt learning in visual tasks, we propose a new prompt paradigm to address the above issues.
arXiv Detail & Related papers (2024-09-11T05:31:50Z)
Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation [74.04806143723597]
We introduce an iterative Prompt-Mask Cycle generation framework (ProMaC) with a prompt generator and a mask generator. The prompt generator uses a multi-scale chain of thought prompting, initially exploring hallucinations for extracting extended contextual knowledge on a test image. The generated masks iteratively induce the prompt generator to focus more on task-relevant image areas and reduce irrelevant hallucinations, resulting jointly in better prompts and masks.
arXiv Detail & Related papers (2024-08-27T17:06:22Z)
Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects [32.14438610147615]
We introduce a test-time adaptation per-instance mechanism called Generalizable SAM (GenSAM) to automatically enerate and optimize visual prompts. Experiments on three benchmarks demonstrate that GenSAM outperforms point supervision approaches.
arXiv Detail & Related papers (2023-12-12T15:43:36Z)
Segment (Almost) Nothing: Prompt-Agnostic Adversarial Attacks on Segmentation Models [61.46999584579775]
General purpose segmentation models are able to generate (semantic) segmentation masks from a variety of prompts. In particular, input images are pre-processed by an image encoder to obtain embedding vectors which are later used for mask predictions. We show that even imperceptible perturbations of radius $epsilon=1/255$ are often sufficient to drastically modify the masks predicted with point, box and text prompts.
arXiv Detail & Related papers (2023-11-24T12:57:34Z)
Explicit Visual Prompting for Universal Foreground Segmentations [55.51869354956533]
We present a unified framework for a number of foreground segmentation tasks without any task-specific designs. We take inspiration from the widely-used pre-training and then prompt tuning protocols in NLP. Our method freezes a pre-trained model and then learns task-specific knowledge using a few extra parameters.
arXiv Detail & Related papers (2023-05-29T11:05:01Z)
Explicit Visual Prompting for Low-Level Structure Segmentations [55.51869354956533]
We propose a new visual prompting model, named Explicit Visual Prompting (EVP) EVP significantly outperforms other parameter-efficient tuning protocols under the same amount of tunable parameters. EVP also achieves state-of-the-art performances on diverse low-level structure segmentation tasks.
arXiv Detail & Related papers (2023-03-20T06:01:53Z)
Target-Aware Object Discovery and Association for Unsupervised Video Multi-Object Segmentation [79.6596425920849]
This paper addresses the task of unsupervised video multi-object segmentation. We introduce a novel approach for more accurate and efficient unseen-temporal segmentation. We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
arXiv Detail & Related papers (2021-04-10T14:39:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.