Related papers: A Simple yet Powerful Instance-Aware Prompting Framework for Training-free Camouflaged Object Segmentation

A Simple yet Powerful Instance-Aware Prompting Framework for Training-free Camouflaged Object Segmentation

URL: http://arxiv.org/abs/2508.06904v2
Date: Wed, 13 Aug 2025 07:40:08 GMT
Title: A Simple yet Powerful Instance-Aware Prompting Framework for Training-free Camouflaged Object Segmentation
Authors: Chao Yin, Jide Li, Xiaoqiang Li,
Abstract summary: We propose a training-free Camouflaged Object pipeline that explicitly converts a task-generic prompt into fine-grained instance masks.<n>The proposed IAPF significantly surpasses existing state-of-the-art training-free COS methods.
Score: 6.712332323439369
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Camouflaged Object Segmentation (COS) remains highly challenging due to the intrinsic visual similarity between target objects and their surroundings. While training-based COS methods achieve good performance, their performance degrades rapidly with increased annotation sparsity. To circumvent this limitation, recent studies have explored training-free COS methods, leveraging the Segment Anything Model (SAM) by automatically generating visual prompts from a single task-generic prompt (\textit{e.g.}, "\textit{camouflaged animal}") uniformly applied across all test images. However, these methods typically produce only semantic-level visual prompts, causing SAM to output coarse semantic masks and thus failing to handle scenarios with multiple discrete camouflaged instances effectively. To address this critical limitation, we propose a simple yet powerful \textbf{I}nstance-\textbf{A}ware \textbf{P}rompting \textbf{F}ramework (IAPF), the first training-free COS pipeline that explicitly converts a task-generic prompt into fine-grained instance masks. Specifically, the IAPF comprises three steps: (1) Text Prompt Generator, utilizing task-generic queries to prompt a Multimodal Large Language Model (MLLM) for generating image-specific foreground and background tags; (2) \textbf{Instance Mask Generator}, leveraging Grounding DINO to produce precise instance-level bounding box prompts, alongside the proposed Single-Foreground Multi-Background Prompting strategy to sample region-constrained point prompts within each box, enabling SAM to yield a candidate instance mask; (3) Self-consistency Instance Mask Voting, which selects the final COS prediction by identifying the candidate mask most consistent across multiple candidate instance masks. Extensive evaluations on standard COS benchmarks demonstrate that the proposed IAPF significantly surpasses existing state-of-the-art training-free COS methods.

Related papers

Discover, Segment, and Select: A Progressive Mechanism for Zero-shot Camouflaged Object Segmentation [40.66340261994875]
textbfDSS is a progressive framework designed to refine segmentation step by step.<n>It achieves state-of-the-art performance on multiple COS benchmarks, especially in multiple-instance scenes.
arXiv Detail & Related papers (2026-02-23T15:15:37Z)
Segment Concealed Objects with Incomplete Supervision [63.637733655439334]
Incompletely-Supervised Concealed Object (ISCOS) involves segmenting objects that seamlessly blend into their surrounding environments.<n>This task remains highly challenging due to the limited supervision provided by the incompletely annotated training data.<n>In this paper, we introduce the first unified method for ISCOS to address these challenges.
arXiv Detail & Related papers (2025-06-10T16:25:15Z)
Pro2SAM: Mask Prompt to SAM with Grid Points for Weakly Supervised Object Localization [54.91271106816616]
We propose an innovative mask prompt to SAM (Pro2SAM) network with grid points for WSOL task.<n>First, we devise a Global Token Transformer (GTFormer) to generate a coarse-grained foreground map as a flexible mask prompt.<n> Secondly, we deliver grid points as dense prompts into SAM to maximize the probability of foreground mask.
arXiv Detail & Related papers (2025-05-08T02:44:53Z)
E-SAM: Training-Free Segment Every Entity Model [22.29478489117426]
We introduce E-SAM, a novel training-free framework that exhibits exceptional ES capability.<n>E-SAM achieves state-of-the-art performance compared to prior ES methods, demonstrating a significant improvement by +30.1 on benchmark metrics.
arXiv Detail & Related papers (2025-03-15T11:41:33Z)
Prompt-Guided Mask Proposal for Two-Stage Open-Vocabulary Segmentation [21.30568336073013]
We tackle the challenge of open-vocabulary segmentation, where we need to identify objects from a wide range of categories in different environments.<n>Existing methods often use multi-modal models like CLIP, which combine image and text features in a shared embedding space.<n>We propose Prompt-guided Mask Proposal (PMP) where the mask generator takes the input text prompts and generates masks guided by these prompts.
arXiv Detail & Related papers (2024-12-13T17:22:50Z)
Bridge the Points: Graph-based Few-shot Segment Anything Semantically [79.1519244940518]
Recent advancements in pre-training techniques have enhanced the capabilities of vision foundation models. Recent studies extend the SAM to Few-shot Semantic segmentation (FSS) We propose a simple yet effective approach based on graph analysis.
arXiv Detail & Related papers (2024-10-09T15:02:28Z)
Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation [74.04806143723597]
We introduce an iterative Prompt-Mask Cycle generation framework (ProMaC) with a prompt generator and a mask generator. The prompt generator uses a multi-scale chain of thought prompting, initially exploring hallucinations for extracting extended contextual knowledge on a test image. The generated masks iteratively induce the prompt generator to focus more on task-relevant image areas and reduce irrelevant hallucinations, resulting jointly in better prompts and masks.
arXiv Detail & Related papers (2024-08-27T17:06:22Z)
PosSAM: Panoptic Open-vocabulary Segment Anything [58.72494640363136]
PosSAM is an open-vocabulary panoptic segmentation model that unifies the strengths of the Segment Anything Model (SAM) with the vision-native CLIP model in an end-to-end framework. We introduce a Mask-Aware Selective Ensembling (MASE) algorithm that adaptively enhances the quality of generated masks and boosts the performance of open-vocabulary classification during inference for each image.
arXiv Detail & Related papers (2024-03-14T17:55:03Z)
Training-free Object Counting with Prompts [12.358565655046977]
Existing approaches rely on extensive training data with point annotations for each object. We propose a training-free object counter that treats the counting task as a segmentation problem.
arXiv Detail & Related papers (2023-06-30T13:26:30Z)
Mask Encoding for Single Shot Instance Segmentation [97.99956029224622]
We propose a simple singleshot instance segmentation framework, termed mask encoding based instance segmentation (MEInst) Instead of predicting the two-dimensional mask directly, MEInst distills it into a compact and fixed-dimensional representation vector. We show that the much simpler and flexible one-stage instance segmentation method, can also achieve competitive performance.
arXiv Detail & Related papers (2020-03-26T02:51:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.