Related papers: Search is All You Need for Few-shot Anomaly Detection

Search is All You Need for Few-shot Anomaly Detection

URL: http://arxiv.org/abs/2504.11895v1
Date: Wed, 16 Apr 2025 09:21:34 GMT
Title: Search is All You Need for Few-shot Anomaly Detection
Authors: Qishan Wang, Jia Guo, Shuyong Gao, Haofen Wang, Li Xiong, Junjie Hu, Hanqi Guo, Wenqiang Zhang,
Abstract summary: Few-shot anomaly detection (FSAD) has emerged as a crucial yet challenging task in industrial inspection.<n>We show that a straightforward nearest-neighbor search framework can surpass state-of-the-art performance in both single-class and multi-class FSAD scenarios.<n>Our method achieves remarkable image-level AUROC scores of 97.4%, 94.8%, and 70.8% respectively.
Score: 39.737510049667556
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Few-shot anomaly detection (FSAD) has emerged as a crucial yet challenging task in industrial inspection, where normal distribution modeling must be accomplished with only a few normal images. While existing approaches typically employ multi-modal foundation models combining language and vision modalities for prompt-guided anomaly detection, these methods often demand sophisticated prompt engineering and extensive manual tuning. In this paper, we demonstrate that a straightforward nearest-neighbor search framework can surpass state-of-the-art performance in both single-class and multi-class FSAD scenarios. Our proposed method, VisionAD, consists of four simple yet essential components: (1) scalable vision foundation models that extract universal and discriminative features; (2) dual augmentation strategies - support augmentation to enhance feature matching adaptability and query augmentation to address the oversights of single-view prediction; (3) multi-layer feature integration that captures both low-frequency global context and high-frequency local details with minimal computational overhead; and (4) a class-aware visual memory bank enabling efficient one-for-all multi-class detection. Extensive evaluations across MVTec-AD, VisA, and Real-IAD benchmarks demonstrate VisionAD's exceptional performance. Using only 1 normal images as support, our method achieves remarkable image-level AUROC scores of 97.4%, 94.8%, and 70.8% respectively, outperforming current state-of-the-art approaches by significant margins (+1.6%, +3.2%, and +1.4%). The training-free nature and superior few-shot capabilities of VisionAD make it particularly appealing for real-world applications where samples are scarce or expensive to obtain. Code is available at https://github.com/Qiqigeww/VisionAD.

Related papers

Visual Instruction Tuning with Chain of Region-of-Interest [9.111425648646229]
We propose a method called Chain of Region-of-Interest (CoRoI) for Visual Instruction Tuning.<n>CoRoI seeks to identify and prioritize the most informative regions, thereby enhancing multimodal visual comprehension and recognition.<n>Our models consistently demonstrate superior performance across diverse multimodal benchmarks and tasks.
arXiv Detail & Related papers (2025-05-11T04:44:03Z)
Exploring the Collaborative Advantage of Low-level Information on Generalizable AI-Generated Image Detection [46.5480496076675]
Existing AI-generated image detection methods consider only a single type of low-level information. Different low-level information often exhibits generalization capabilities for different types of forgeries. We propose the Adaptive Low-level Experts Injection framework, enabling the backbone network to accept and learn knowledge from different low-level information.
arXiv Detail & Related papers (2025-04-01T06:38:08Z)
EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models [64.18350535770357]
We propose an automatic pruning method for large vision-language models to enhance the efficiency of multimodal reasoning.<n>Our approach only leverages a small number of samples to search for the desired pruning policy.<n>We conduct extensive experiments on the ScienceQA, Vizwiz, MM-vet, and LLaVA-Bench datasets for the task of visual question answering.
arXiv Detail & Related papers (2025-03-19T16:07:04Z)
TAVP: Task-Adaptive Visual Prompt for Cross-domain Few-shot Segmentation [40.49924427388922]
We propose a task-adaptive auto-visual prompt framework for Cross-dominan Few-shot segmentation (CD-FSS)<n>We incorporate a Class Domain Task-Adaptive Auto-Prompt (CDTAP) module to enable class-domain feature extraction and generate high-quality, learnable visual prompts.<n>Our model outperforms the state-of-the-art CD-FSS approach, achieving an average accuracy improvement of 1.3% in the 1-shot setting and 11.76% in the 5-shot setting.
arXiv Detail & Related papers (2024-09-09T07:43:58Z)
AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2 [16.69402464709241]
We adapt DINOv2 for one-shot and few-shot anomaly detection, with a focus on industrial applications.<n>Our proposed vision-only approach, AnomalyDINO, follows the well-established patch-level deep nearest neighbor paradigm.<n>We show that this approach does not only rival existing techniques but can even outmatch them in many settings.
arXiv Detail & Related papers (2024-05-23T13:15:13Z)
Dual-Image Enhanced CLIP for Zero-Shot Anomaly Detection [58.228940066769596]
We introduce a Dual-Image Enhanced CLIP approach, leveraging a joint vision-language scoring system. Our methods process pairs of images, utilizing each as a visual reference for the other, thereby enriching the inference process with visual context. Our approach significantly exploits the potential of vision-language joint anomaly detection and demonstrates comparable performance with current SOTA methods across various datasets.
arXiv Detail & Related papers (2024-05-08T03:13:20Z)
Anomaly Detection by Adapting a pre-trained Vision Language Model [48.225404732089515]
We present a unified framework named CLIP-ADA for Anomaly Detection by Adapting a pre-trained CLIP model. We introduce the learnable prompt and propose to associate it with abnormal patterns through self-supervised learning. We achieve the state-of-the-art 97.5/55.6 and 89.3/33.1 on MVTec-AD and VisA for anomaly detection and localization.
arXiv Detail & Related papers (2024-03-14T15:35:07Z)
Optimization Efficient Open-World Visual Region Recognition [55.76437190434433]
RegionSpot integrates position-aware localization knowledge from a localization foundation model with semantic information from a ViL model. Experiments in open-world object recognition show that our RegionSpot achieves significant performance gain over prior alternatives.
arXiv Detail & Related papers (2023-11-02T16:31:49Z)
Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues. No human annotations are involved in our framework during the whole training process. Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z)
Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture. We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions. Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.