Towards Training-free Anomaly Detection with Vision and Language Foundation Models
- URL: http://arxiv.org/abs/2503.18325v1
- Date: Mon, 24 Mar 2025 04:07:59 GMT
- Title: Towards Training-free Anomaly Detection with Vision and Language Foundation Models
- Authors: Jinjin Zhang, Guodong Wang, Yizhou Jin, Di Huang,
- Abstract summary: Anomaly detection is valuable for real-world applications, such as industrial quality inspection.<n>We introduce LogSAD, a novel multi-modal framework that requires no training for both Logical and Structural Anomaly Detection.
- Score: 17.991678161890174
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Anomaly detection is valuable for real-world applications, such as industrial quality inspection. However, most approaches focus on detecting local structural anomalies while neglecting compositional anomalies incorporating logical constraints. In this paper, we introduce LogSAD, a novel multi-modal framework that requires no training for both Logical and Structural Anomaly Detection. First, we propose a match-of-thought architecture that employs advanced large multi-modal models (i.e. GPT-4V) to generate matching proposals, formulating interests and compositional rules of thought for anomaly detection. Second, we elaborate on multi-granularity anomaly detection, consisting of patch tokens, sets of interests, and composition matching with vision and language foundation models. Subsequently, we present a calibration module to align anomaly scores from different detectors, followed by integration strategies for the final decision. Consequently, our approach addresses both logical and structural anomaly detection within a unified framework and achieves state-of-the-art results without the need for training, even when compared to supervised approaches, highlighting its robustness and effectiveness. Code is available at https://github.com/zhang0jhon/LogSAD.
Related papers
- LAD-Reasoner: Tiny Multimodal Models are Good Reasoners for Logical Anomaly Detection [27.45348890285863]
We introduce Reasoning Logical Anomaly Detection (RLAD), which extends traditional anomaly detection by incorporating logical reasoning.
We propose a new framework, LAD-Reasoner, a customized tiny multimodal language model built on Qwen2.5-VL 3B.
Experiments on the MVTec LOCO AD dataset show that LAD-Reasoner, though significantly smaller, matches the performance of Qwen2.5-VL-72B in accuracy and F1 score.
arXiv Detail & Related papers (2025-04-17T08:41:23Z) - Optimizing Multispectral Object Detection: A Bag of Tricks and Comprehensive Benchmarks [49.84182981950623]
Multispectral object detection, utilizing RGB and TIR (thermal infrared) modalities, is widely recognized as a challenging task.<n>It requires not only the effective extraction of features from both modalities and robust fusion strategies, but also the ability to address issues such as spectral discrepancies.<n>We introduce an efficient and easily deployable multispectral object detection framework that can seamlessly optimize high-performing single-modality models.
arXiv Detail & Related papers (2024-11-27T12:18:39Z) - AAD-LLM: Adaptive Anomaly Detection Using Large Language Models [35.286105732902065]
The research aims to improve the transferability of anomaly detection models by leveraging Large Language Models (LLMs)
The research also seeks to enable more collaborative decision-making between the model and plant operators.
arXiv Detail & Related papers (2024-11-01T13:43:28Z) - Revisiting Deep Feature Reconstruction for Logical and Structural Industrial Anomaly Detection [2.3020018305241337]
Industrial anomaly detection is crucial for quality control and predictive maintenance.
Existing methods commonly detect structural anomalies, such as dents and scratches, by leveraging multi-scale features from image patches extracted through deep pre-trained networks.
We address these limitations by focusing on Deep Feature Reconstruction (DFR), a memory- and compute-efficient approach for detecting structural anomalies.
We further enhance DFR into a unified framework, called ULSAD, which is capable of detecting both structural and logical anomalies.
arXiv Detail & Related papers (2024-10-21T17:56:47Z) - GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features [68.14842693208465]
GeneralAD is an anomaly detection framework designed to operate in semantic, near-distribution, and industrial settings.
We propose a novel self-supervised anomaly generation module that employs straightforward operations like noise addition and shuffling to patch features.
We extensively evaluated our approach on ten datasets, achieving state-of-the-art results in six and on-par performance in the remaining.
arXiv Detail & Related papers (2024-07-17T09:27:41Z) - Generating and Reweighting Dense Contrastive Patterns for Unsupervised
Anomaly Detection [59.34318192698142]
We introduce a prior-less anomaly generation paradigm and develop an innovative unsupervised anomaly detection framework named GRAD.
PatchDiff effectively expose various types of anomaly patterns.
experiments on both MVTec AD and MVTec LOCO datasets also support the aforementioned observation.
arXiv Detail & Related papers (2023-12-26T07:08:06Z) - Open-Vocabulary Video Anomaly Detection [57.552523669351636]
Video anomaly detection (VAD) with weak supervision has achieved remarkable performance in utilizing video-level labels to discriminate whether a video frame is normal or abnormal.
Recent studies attempt to tackle a more realistic setting, open-set VAD, which aims to detect unseen anomalies given seen anomalies and normal videos.
This paper takes a step further and explores open-vocabulary video anomaly detection (OVVAD), in which we aim to leverage pre-trained large models to detect and categorize seen and unseen anomalies.
arXiv Detail & Related papers (2023-11-13T02:54:17Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Component-aware anomaly detection framework for adjustable and logical
industrial visual inspection [4.444590838289701]
We propose a novel component-aware anomaly detection framework (ComAD)
It can simultaneously achieve adjustable and logical anomaly detection for industrial scenarios.
Our framework achieves state-of-the-art performance on image-level logical anomaly detection.
arXiv Detail & Related papers (2023-05-15T10:18:52Z) - Learning Global-Local Correspondence with Semantic Bottleneck for
Logical Anomaly Detection [6.553276620691242]
This paper presents a novel framework, named Global-Local Correspondence Framework (GLCF), for visual anomaly detection with logical constraints.
Visual anomaly detection has become an active research area in various real-world applications, such as industrial anomaly detection and medical disease diagnosis.
arXiv Detail & Related papers (2023-03-10T08:09:40Z) - Self-Supervised Predictive Convolutional Attentive Block for Anomaly
Detection [97.93062818228015]
We propose to integrate the reconstruction-based functionality into a novel self-supervised predictive architectural building block.
Our block is equipped with a loss that minimizes the reconstruction error with respect to the masked area in the receptive field.
We demonstrate the generality of our block by integrating it into several state-of-the-art frameworks for anomaly detection on image and video.
arXiv Detail & Related papers (2021-11-17T13:30:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.