AFR-CLIP: Enhancing Zero-Shot Industrial Anomaly Detection with Stateless-to-Stateful Anomaly Feature Rectification
- URL: http://arxiv.org/abs/2503.12910v3
- Date: Mon, 18 Aug 2025 02:02:38 GMT
- Title: AFR-CLIP: Enhancing Zero-Shot Industrial Anomaly Detection with Stateless-to-Stateful Anomaly Feature Rectification
- Authors: Jingyi Yuan, Chenqiang Gao, Pengyu Jie, Xuan Xia, Shangri Huang, Wanquan Liu,
- Abstract summary: We propose AFR-CLIP, a CLIP-based anomaly feature rectification framework.<n>It generates anomaly maps by measuring the cosine similarity between visual and textual features.<n>Experiments are conducted on eleven anomaly detection benchmarks across industrial and medical domains.
- Score: 11.844008592270555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, zero-shot anomaly detection (ZSAD) has emerged as a pivotal paradigm for industrial inspection and medical diagnostics, detecting defects in novel objects without requiring any target-dataset samples during training. Existing CLIP-based ZSAD methods generate anomaly maps by measuring the cosine similarity between visual and textual features. However, CLIP's alignment with object categories instead of their anomalous states limits its effectiveness for anomaly detection. To address this limitation, we propose AFR-CLIP, a CLIP-based anomaly feature rectification framework. AFR-CLIP first performs image-guided textual rectification, embedding the implicit defect information from the image into a stateless prompt that describes the object category without indicating any anomalous state. The enriched textual embeddings are then compared with two pre-defined stateful (normal or abnormal) embeddings, and their text-on-text similarity yields the anomaly map that highlights defective regions. To further enhance perception to multi-scale features and complex anomalies, we introduce self prompting (SP) and multi-patch feature aggregation (MPFA) modules. Extensive experiments are conducted on eleven anomaly detection benchmarks across industrial and medical domains, demonstrating AFR-CLIP's superiority in ZSAD.
Related papers
- DevPrompt: Deviation-Based Prompt Learning for One-Normal ShotImage Anomaly Detection [0.0]
Few-normal shot anomaly detection (FNSAD) aims to detect abnormal regions in images using only a few normal training samples.<n>Recent approaches leverage vision-language models such as CLIP with prompt-based learning to align image and text features.<n>We propose a deviation-guided prompt learning framework that integrates the semantic power of vision-language models with the statistical reliability of deviation-based scoring.
arXiv Detail & Related papers (2026-01-21T20:35:51Z) - Unified Unsupervised Anomaly Detection via Matching Cost Filtering [113.43366521994396]
Unsupervised anomaly detection (UAD) aims to identify image- and pixel-level anomalies using only normal training data.<n>We present Unified Cost Filtering (UCF), a generic post-hoc refinement framework for refining anomaly cost volume of any UAD model.
arXiv Detail & Related papers (2025-10-03T03:28:18Z) - AF-CLIP: Zero-Shot Anomaly Detection via Anomaly-Focused CLIP Adaptation [8.252046294696585]
We propose AF-CLIP (Anomaly-Focused CLIP) by dramatically enhancing its visual representations to focus on local defects.<n>Our approach introduces a lightweight adapter that emphasizes anomaly-relevant patterns in visual features.<n>Our method is also extended to few-shot scenarios by extra memory banks.
arXiv Detail & Related papers (2025-07-26T13:34:38Z) - Generate Aligned Anomaly: Region-Guided Few-Shot Anomaly Image-Mask Pair Synthesis for Industrial Inspection [53.137651284042434]
Anomaly inspection plays a vital role in industrial manufacturing, but the scarcity of anomaly samples limits the effectiveness of existing methods.<n>We propose Generate grained Anomaly (GAA), a region-guided, few-shot anomaly image-mask pair generation framework.<n>GAA generates realistic, diverse, and semantically aligned anomalies using only a small number of samples.
arXiv Detail & Related papers (2025-07-13T12:56:59Z) - FastRef:Fast Prototype Refinement for Few-Shot Industrial Anomaly Detection [18.487111110151115]
Few-shot industrial anomaly detection (FS-IAD) presents a critical challenge for practical automated inspection systems.<n>We propose FastRef, a novel and efficient prototype refinement framework for FS-IAD.<n>For comprehensive evaluation, we integrate FastRef with three competitive prototype-based FS-IAD methods: PatchCore, FastRecon, WinCLIP, and AnomalyDINO.
arXiv Detail & Related papers (2025-06-26T15:46:28Z) - CLIP Meets Diffusion: A Synergistic Approach to Anomaly Detection [54.85000884785013]
Anomaly detection is a complex problem due to the ambiguity in defining anomalies, the diversity of anomaly types, and the scarcity of training data.<n>We propose CLIPfusion, a method that leverages both discriminative and generative foundation models.<n>We believe that our method underscores the effectiveness of multi-modal and multi-model fusion in tackling the multifaceted challenges of anomaly detection.
arXiv Detail & Related papers (2025-06-13T13:30:15Z) - MultiADS: Defect-aware Supervision for Multi-type Anomaly Detection and Segmentation in Zero-Shot Learning [27.235318937019255]
It is crucial to know the distinct type of defect, such as a bent, cut, or scratch.
The ability to recognize the "exact" defect type enables automated treatments of the anomalies in modern production lines.
We propose MultiADS, a zero-shot learning approach, able to perform Multi-type Anomaly Detection.
arXiv Detail & Related papers (2025-04-09T09:52:04Z) - Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation [61.64052577026623]
Real-world multi-view datasets are often heterogeneous and imperfect.
We propose a novel robust MVL method (namely RML) with simultaneous representation fusion and alignment.
In experiments, we employ it in unsupervised multi-view clustering, noise-label classification, and as a plug-and-play module for cross-modal hashing retrieval.
arXiv Detail & Related papers (2025-03-06T07:01:08Z) - PA-CLIP: Enhancing Zero-Shot Anomaly Detection through Pseudo-Anomaly Awareness [10.364634539199422]
We introduce PA-CLIP, a zero-shot anomaly detection method that reduces background noise and enhances defect detection through a pseudo-anomaly-based framework.<n>The proposed method integrates a multiscale feature aggregation strategy for capturing detailed global and local information.<n>It outperforms existing zero-shot methods, providing a robust solution for industrial defect detection.
arXiv Detail & Related papers (2025-03-03T08:29:27Z) - Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection [109.72772150095646]
FAPrompt is a novel framework designed to learn Fine-grained Abnormality Prompts for accurate ZSAD.<n>Experiments on 19 real-world datasets, covering both industrial defects and medical anomalies, demonstrate that FAPrompt substantially outperforms state-of-the-art methods in both image- and pixel-level ZSAD tasks.
arXiv Detail & Related papers (2024-10-14T08:41:31Z) - VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection [19.79027968793026]
Zero-shot anomaly detection (ZSAD) recognizes and localizes anomalies in previously unseen objects.
Existing ZSAD methods are limited by closed-world settings, struggling to unseen defects with predefined prompts.
We propose a novel framework VMAD (Visual-enhanced MLLM Anomaly Detection) that enhances MLLM with visual-based IAD knowledge and fine-grained perception.
arXiv Detail & Related papers (2024-09-30T09:51:29Z) - Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization [52.87635234206178]
This paper proposes a new framework, namely MoNFAP, specifically tailored for multi-face manipulation detection and localization.
The framework incorporates two novel modules: the Forgery-aware Unified Predictor (FUP) Module and the Mixture-of-Noises Module (MNM)
arXiv Detail & Related papers (2024-08-05T08:35:59Z) - Prior Normality Prompt Transformer for Multi-class Industrial Image Anomaly Detection [6.865429486202104]
We introduce Prior Normality Prompt Transformer (PNPT) for multi-class anomaly detection.
PNPT strategically incorporates normal semantics prompting to mitigate the "identical mapping" problem.
This entails integrating a prior normality prompt into the reconstruction process, yielding a dual-stream model.
arXiv Detail & Related papers (2024-06-17T13:10:04Z) - FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization [31.854923603517264]
We propose a novel zero-shot anomaly detection (ZSAD) method called FiLo.
FiLo comprises two components: adaptively learned Fine-Grained Description (FG-Des) and position-enhanced High- quality localization (HQ-Loc)
Experimental results on datasets like MVTec and VisA demonstrate that FiLo significantly improves the performance of ZSAD in both detection and localization.
arXiv Detail & Related papers (2024-04-21T14:22:04Z) - PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection [59.34973469354926]
This paper proposes a one-class prompt learning method for few-shot anomaly detection, termed PromptAD.
For image-level/pixel-level anomaly detection, PromptAD achieves first place in 11/12 few-shot settings on MVTec and VisA.
arXiv Detail & Related papers (2024-04-08T06:53:30Z) - Toward Multi-class Anomaly Detection: Exploring Class-aware Unified Model against Inter-class Interference [67.36605226797887]
We introduce a Multi-class Implicit Neural representation Transformer for unified Anomaly Detection (MINT-AD)
By learning the multi-class distributions, the model generates class-aware query embeddings for the transformer decoder.
MINT-AD can project category and position information into a feature embedding space, further supervised by classification and prior probability loss functions.
arXiv Detail & Related papers (2024-03-21T08:08:31Z) - Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation : A Unified Approach [49.995833831087175]
This work proposes a novel method for generating generic Video-temporal PAs by inpainting a masked out region of an image.
In addition, we present a simple unified framework to detect real-world anomalies under the OCC setting.
Our method performs on par with other existing state-of-the-art PAs generation and reconstruction based methods under the OCC setting.
arXiv Detail & Related papers (2023-11-27T13:14:06Z) - Open-Vocabulary Video Anomaly Detection [57.552523669351636]
Video anomaly detection (VAD) with weak supervision has achieved remarkable performance in utilizing video-level labels to discriminate whether a video frame is normal or abnormal.
Recent studies attempt to tackle a more realistic setting, open-set VAD, which aims to detect unseen anomalies given seen anomalies and normal videos.
This paper takes a step further and explores open-vocabulary video anomaly detection (OVVAD), in which we aim to leverage pre-trained large models to detect and categorize seen and unseen anomalies.
arXiv Detail & Related papers (2023-11-13T02:54:17Z) - Hard-normal Example-aware Template Mutual Matching for Industrial Anomaly Detection [78.734927709231]
Anomaly detectors are widely used in industrial manufacturing to detect and localize unknown defects in query images.<n>These detectors are trained on anomaly-free samples and have successfully distinguished anomalies from most normal samples.<n>However, hard-normal examples are scattered and far apart from most normal samples, and thus they are often mistaken for anomalies by existing methods.
arXiv Detail & Related papers (2023-03-28T17:54:56Z) - BatchFormerV2: Exploring Sample Relationships for Dense Representation
Learning [88.82371069668147]
BatchFormerV2 is a more general batch Transformer module, which enables exploring sample relationships for dense representation learning.
BatchFormerV2 consistently improves current DETR-based detection methods by over 1.3%.
arXiv Detail & Related papers (2022-04-04T05:53:42Z) - UFPMP-Det: Toward Accurate and Efficient Object Detection on Drone
Imagery [26.27705791338182]
This paper proposes a novel approach to object detection on drone imagery, namely Multi- Proxy Detection Network with Unified Foreground Packing (UFPMP-Det)
UFPMP-Det is designed to deal with the numerous instances of very small scales, different from the common solution that divides the high-resolution input image into quite a number of chips with low foreground ratios to perform detection on them each.
Experiments are carried out on the widely used VisDrone and UAVDT datasets, and UFPMP-Det reports new state-of-the-art scores at a much higher speed, highlighting its advantages
arXiv Detail & Related papers (2021-12-20T09:28:44Z) - Self-Supervised Predictive Convolutional Attentive Block for Anomaly
Detection [97.93062818228015]
We propose to integrate the reconstruction-based functionality into a novel self-supervised predictive architectural building block.
Our block is equipped with a loss that minimizes the reconstruction error with respect to the masked area in the receptive field.
We demonstrate the generality of our block by integrating it into several state-of-the-art frameworks for anomaly detection on image and video.
arXiv Detail & Related papers (2021-11-17T13:30:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.