Enhancing Zero-Shot Anomaly Detection: CLIP-SAM Collaboration with Cascaded Prompts
- URL: http://arxiv.org/abs/2510.11028v1
- Date: Mon, 13 Oct 2025 05:53:49 GMT
- Title: Enhancing Zero-Shot Anomaly Detection: CLIP-SAM Collaboration with Cascaded Prompts
- Authors: Yanning Hou, Ke Xu, Junfa Li, Yanran Ruan, Jianfeng Qiu,
- Abstract summary: This paper proposes a novel two-stage framework, for zero-shot anomaly segmentation tasks in industrial anomaly detection.<n>To mitigate SAM's inclination towards object segmentation, we propose the Co-Feature Point Prompt Generation module.<n>To further optimize SAM's segmentation results, we introduce the Cascaded Prompts for SAM (CPS) module.
- Score: 5.225009704851243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the powerful generalization ability exhibited by foundation models has brought forth new solutions for zero-shot anomaly segmentation tasks. However, guiding these foundation models correctly to address downstream tasks remains a challenge. This paper proposes a novel two-stage framework, for zero-shot anomaly segmentation tasks in industrial anomaly detection. This framework excellently leverages the powerful anomaly localization capability of CLIP and the boundary perception ability of SAM.(1) To mitigate SAM's inclination towards object segmentation, we propose the Co-Feature Point Prompt Generation (PPG) module. This module collaboratively utilizes CLIP and SAM to generate positive and negative point prompts, guiding SAM to focus on segmenting anomalous regions rather than the entire object. (2) To further optimize SAM's segmentation results and mitigate rough boundaries and isolated noise, we introduce the Cascaded Prompts for SAM (CPS) module. This module employs hybrid prompts cascaded with a lightweight decoder of SAM, achieving precise segmentation of anomalous regions. Across multiple datasets, consistent experimental validation demonstrates that our approach achieves state-of-the-art zero-shot anomaly segmentation results. Particularly noteworthy is our performance on the Visa dataset, where we outperform the state-of-the-art methods by 10.3\% and 7.7\% in terms of {$F_1$-max} and AP metrics, respectively.
Related papers
- SCALER: SAM-Enhanced Collaborative Learning for Label-Deficient Concealed Object Segmentation [32.56241263919416]
Existing methods for label-deficient concealed object segmentation (LDCOS) rely on consistency constraints or Segment Anything Model (SAM)-based pseudo-labeling.<n>This study investigates two key questions: (1) Can consistency constraints and SAM-based supervision be jointly integrated to better exploit complementary information and enhance the segmenter? and (2) beyond that, can the segmenter in turn guide SAM through reciprocal supervision, enabling mutual improvement.
arXiv Detail & Related papers (2025-11-22T17:48:17Z) - ST-SAM: SAM-Driven Self-Training Framework for Semi-Supervised Camouflaged Object Detection [14.06736878203419]
Semi-supervised Camouflaged Object Detection (SSCOD) aims to reduce reliance on costly pixel-level annotations.<n>Existing SSCOD methods suffer from severe prediction bias and error propagation under scarce supervision.<n>We propose ST-SAM, a highly annotation-efficient yet concise framework.
arXiv Detail & Related papers (2025-07-31T07:41:30Z) - ConformalSAM: Unlocking the Potential of Foundational Segmentation Models in Semi-Supervised Semantic Segmentation with Conformal Prediction [57.930531826380836]
This work explores whether a foundational segmentation model can address label scarcity in the pixel-level vision task as an annotator for unlabeled images.<n>We propose ConformalSAM, a novel SSSS framework which first calibrates the foundation model using the target domain's labeled data and then filters out unreliable pixel labels of unlabeled data.
arXiv Detail & Related papers (2025-07-21T17:02:57Z) - Segment Concealed Objects with Incomplete Supervision [63.637733655439334]
Incompletely-Supervised Concealed Object (ISCOS) involves segmenting objects that seamlessly blend into their surrounding environments.<n>This task remains highly challenging due to the limited supervision provided by the incompletely annotated training data.<n>In this paper, we introduce the first unified method for ISCOS to address these challenges.
arXiv Detail & Related papers (2025-06-10T16:25:15Z) - S^4M: Boosting Semi-Supervised Instance Segmentation with SAM [25.94737539065708]
Semi-supervised instance segmentation poses challenges due to limited labeled data.<n>Current teacher-student frameworks still suffer from performance constraints due to unreliable pseudo-label quality.
arXiv Detail & Related papers (2025-04-07T17:59:10Z) - SAQ-SAM: Semantically-Aligned Quantization for Segment Anything Model [9.381558154295012]
We propose Perceptual-Consistency Clipping, which exploits attention focus overlap as clipping metric, to significantly suppress outliers.<n>We also propose Prompt-Aware Reconstruction, which incorporates visual-prompt interactions by leveraging cross-attention responses in mask decoder.<n>Our method achieves 11.7% higher mAP than the baseline in segmentation task.
arXiv Detail & Related papers (2025-03-09T08:38:32Z) - Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning [63.55145330447408]
We propose a novel textbfSelf-textbfPerceptinon textbfTuning (textbfSPT) method for anomaly segmentation.<n>The SPT method incorporates a self-drafting tuning strategy, which generates an initial coarse draft of the anomaly mask, followed by a refinement process.
arXiv Detail & Related papers (2024-11-26T08:33:25Z) - Adapting Segment Anything Model for Unseen Object Instance Segmentation [70.60171342436092]
Unseen Object Instance (UOIS) is crucial for autonomous robots operating in unstructured environments.
We propose UOIS-SAM, a data-efficient solution for the UOIS task.
UOIS-SAM integrates two key components: (i) a Heatmap-based Prompt Generator (HPG) to generate class-agnostic point prompts with precise foreground prediction, and (ii) a Hierarchical Discrimination Network (HDNet) that adapts SAM's mask decoder.
arXiv Detail & Related papers (2024-09-23T19:05:50Z) - AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning [61.666973416903005]
Segment Anything Model (SAM) has demonstrated its impressive generalization capabilities in open-world scenarios with the guidance of prompts.
We propose a novel framework, termed AlignSAM, designed for automatic prompting for aligning SAM to an open context.
arXiv Detail & Related papers (2024-06-01T16:21:39Z) - ClipSAM: CLIP and SAM Collaboration for Zero-Shot Anomaly Segmentation [5.376142948115328]
We propose a CLIP and SAM collaboration framework called ClipSAM for ZSAS.
The insight behind ClipSAM is to employ CLIP's semantic understanding capability for anomaly localization and rough segmentation.
In details, we introduce a crucial Unified Multi-scale Cross-modal Interaction (UMCI) module for interacting with visual features.
arXiv Detail & Related papers (2024-01-23T11:20:03Z) - Stable Segment Anything Model [79.9005670886038]
The Segment Anything Model (SAM) achieves remarkable promptable segmentation given high-quality prompts.
This paper presents the first comprehensive analysis on SAM's segmentation stability across a diverse spectrum of prompt qualities.
Our solution, termed Stable-SAM, offers several advantages: 1) improved SAM's segmentation stability across a wide range of prompt qualities, while 2) retaining SAM's powerful promptable segmentation efficiency and generality.
arXiv Detail & Related papers (2023-11-27T12:51:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.