Related papers: AA-CLIP: Enhancing Zero-shot Anomaly Detection via Anomaly-Aware CLIP

AA-CLIP: Enhancing Zero-shot Anomaly Detection via Anomaly-Aware CLIP

URL: http://arxiv.org/abs/2503.06661v1
Date: Sun, 09 Mar 2025 15:22:52 GMT
Title: AA-CLIP: Enhancing Zero-shot Anomaly Detection via Anomaly-Aware CLIP
Authors: Wenxin Ma, Xu Zhang, Qingsong Yao, Fenghe Tang, Chenxu Wu, Yingtai Li, Rui Yan, Zihang Jiang, S. Kevin Zhou,
Abstract summary: Anomaly detection (AD) identifies outliers for applications like defect and lesion detection.<n>We propose Anomaly-Aware CLIP (AA-CLIP), which enhances CLIP's anomaly discrimination ability in both text and visual spaces.<n>AA-CLIP is achieved through a straightforward yet effective two-stage approach.
Score: 33.213400694016
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Anomaly detection (AD) identifies outliers for applications like defect and lesion detection. While CLIP shows promise for zero-shot AD tasks due to its strong generalization capabilities, its inherent Anomaly-Unawareness leads to limited discrimination between normal and abnormal features. To address this problem, we propose Anomaly-Aware CLIP (AA-CLIP), which enhances CLIP's anomaly discrimination ability in both text and visual spaces while preserving its generalization capability. AA-CLIP is achieved through a straightforward yet effective two-stage approach: it first creates anomaly-aware text anchors to differentiate normal and abnormal semantics clearly, then aligns patch-level visual features with these anchors for precise anomaly localization. This two-stage strategy, with the help of residual adapters, gradually adapts CLIP in a controlled manner, achieving effective AD while maintaining CLIP's class knowledge. Extensive experiments validate AA-CLIP as a resource-efficient solution for zero-shot AD tasks, achieving state-of-the-art results in industrial and medical applications. The code is available at https://github.com/Mwxinnn/AA-CLIP.

Related papers

AF-CLIP: Zero-Shot Anomaly Detection via Anomaly-Focused CLIP Adaptation [8.252046294696585]
We propose AF-CLIP (Anomaly-Focused CLIP) by dramatically enhancing its visual representations to focus on local defects.<n>Our approach introduces a lightweight adapter that emphasizes anomaly-relevant patterns in visual features.<n>Our method is also extended to few-shot scenarios by extra memory banks.
arXiv Detail & Related papers (2025-07-26T13:34:38Z)
MadCLIP: Few-shot Medical Anomaly Detection with CLIP [14.023527193608142]
An innovative few-shot anomaly detection approach is presented, leveraging the pre-trained CLIP model for medical data.<n>A dual-branch design is proposed to separately capture normal and abnormal features through learnable adapters.<n>To improve semantic alignment, learnable text prompts are employed to link visual features.
arXiv Detail & Related papers (2025-06-30T12:56:17Z)
Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly Detections [50.343419243749054]
Anomaly Detection (AD) involves identifying deviations from normal data distributions. We propose a novel approach that conditions the prompts of the text encoder based on image context extracted from the vision encoder. Our method achieves state-of-the-art performance, improving performance by 2% to 29% across different metrics on 14 datasets.
arXiv Detail & Related papers (2025-04-15T10:42:25Z)
OFF-CLIP: Improving Normal Detection Confidence in Radiology CLIP with Simple Off-Diagonal Term Auto-Adjustment [6.085134938844728]
We propose OFF-CLIP, a contrastive learning refinement that improves normal detection.<n> OFF-CLIP can be applied to radiology CLIP models without requiring any architectural modifications.
arXiv Detail & Related papers (2025-03-03T18:24:11Z)
KAnoCLIP: Zero-Shot Anomaly Detection through Knowledge-Driven Prompt Learning and Enhanced Cross-Modal Integration [9.688664292809785]
Zero-shot anomaly detection (ZSAD) identifies anomalies without needing training samples from the target dataset.<n> vision-language models like CLIP show potential in ZSAD but have limitations.<n>We introduce KAnoCLIP, a novel ZSAD framework that leverages vision-language models.<n> KAnoCLIP achieves state-of-the-art performance in ZSAD across 12 industrial and medical datasets.
arXiv Detail & Related papers (2025-01-07T13:51:41Z)
Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation [19.749490092520006]
Self-Calibrated CLIP (SC-CLIP) is a training-free method that calibrates CLIP to produce finer representations.<n>SC-CLIP boosts the performance of vanilla CLIP ViT-L/14 by 6.8 times.
arXiv Detail & Related papers (2024-11-24T15:14:05Z)
C2P-CLIP: Injecting Category Common Prompt in CLIP to Enhance Generalization in Deepfake Detection [98.34703790782254]
We introduce Category Common Prompt CLIP, which integrates the category common prompt into the text encoder to inject category-related concepts into the image encoder.<n>Our method achieves a 12.41% improvement in detection accuracy compared to the original CLIP, without introducing additional parameters during testing.
arXiv Detail & Related papers (2024-08-19T02:14:25Z)
AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection [14.916862007773341]
This study introduces AdaCLIP for the ZSAD task, leveraging a pre-trained vision-language model (VLM), CLIP. AdaCLIP incorporates learnable prompts into CLIP and optimize them through training on auxiliary annotated anomaly detection data. Experiments conducted across 14 real-world anomaly detection datasets from industrial and medical domains indicate that AdaCLIP outperforms other ZSAD methods.
arXiv Detail & Related papers (2024-07-22T16:52:37Z)
BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP [55.33331463515103]
BadCLIP is built on a novel and effective mechanism in backdoor attacks on CLIP. It consists of a learnable trigger applied to images and a trigger-aware context generator, such that the trigger can change text features via trigger-aware prompts.
arXiv Detail & Related papers (2023-11-26T14:24:13Z)
Bootstrap Fine-Grained Vision-Language Alignment for Unified Zero-Shot Anomaly Localization [63.61093388441298]
Contrastive Language-Image Pre-training models have shown promising performance on zero-shot visual recognition tasks. In this work, we propose AnoCLIP for zero-shot anomaly localization.
arXiv Detail & Related papers (2023-08-30T10:35:36Z)
VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection [58.47940430618352]
We propose VadCLIP, a new paradigm for weakly supervised video anomaly detection (WSVAD) VadCLIP makes full use of fine-grained associations between vision and language on the strength of CLIP. We conduct extensive experiments on two commonly-used benchmarks, demonstrating that VadCLIP achieves the best performance on both coarse-grained and fine-grained WSVAD.
arXiv Detail & Related papers (2023-08-22T14:58:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.