Exploring Large Vision-Language Models for Robust and Efficient Industrial Anomaly Detection
- URL: http://arxiv.org/abs/2412.00890v1
- Date: Sun, 01 Dec 2024 17:00:43 GMT
- Title: Exploring Large Vision-Language Models for Robust and Efficient Industrial Anomaly Detection
- Authors: Kun Qian, Tianyu Sun, Wenhong Wang,
- Abstract summary: We propose Vision-Language Anomaly Detection via Contrastive Cross-Modal Training (CLAD)<n> CLAD aligns visual and textual features into a shared embedding space using contrastive learning.<n>We demonstrate that CLAD outperforms state-of-the-art methods in both image-level anomaly detection and pixel-level anomaly localization.
- Score: 4.691083532629246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Industrial anomaly detection (IAD) plays a crucial role in the maintenance and quality control of manufacturing processes. In this paper, we propose a novel approach, Vision-Language Anomaly Detection via Contrastive Cross-Modal Training (CLAD), which leverages large vision-language models (LVLMs) to improve both anomaly detection and localization in industrial settings. CLAD aligns visual and textual features into a shared embedding space using contrastive learning, ensuring that normal instances are grouped together while anomalies are pushed apart. Through extensive experiments on two benchmark industrial datasets, MVTec-AD and VisA, we demonstrate that CLAD outperforms state-of-the-art methods in both image-level anomaly detection and pixel-level anomaly localization. Additionally, we provide ablation studies and human evaluation to validate the importance of key components in our method. Our approach not only achieves superior performance but also enhances interpretability by accurately localizing anomalies, making it a promising solution for real-world industrial applications.
Related papers
- Beyond Academic Benchmarks: Critical Analysis and Best Practices for Visual Industrial Anomaly Detection [40.174488947319645]
Anomaly detection (AD) is essential for automating visual inspection in manufacturing.
This paper makes three key contributions: (1) we demonstrate the importance of real-world datasets and establish benchmarks using actual production data; (2) we provide a fair comparison of existing SOTA methods across diverse tasks by utilizing metrics that are valuable for practical applications; and (3) we present a comprehensive analysis of recent advancements in this field by discussing important challenges and new perspectives for bridging the academia-industry gap.
arXiv Detail & Related papers (2025-03-30T14:11:46Z) - Robust Distribution Alignment for Industrial Anomaly Detection under Distribution Shift [51.24522135151649]
Anomaly detection plays a crucial role in quality control for industrial applications.
Existing methods attempt to address domain shifts by training generalizable models.
Our proposed method demonstrates superior results compared with state-of-the-art anomaly detection and domain adaptation methods.
arXiv Detail & Related papers (2025-03-19T05:25:52Z) - EIAD: Explainable Industrial Anomaly Detection Via Multi-Modal Large Language Models [23.898938659720503]
Industrial Anomaly Detection (IAD) is critical to ensure product quality during manufacturing.
We propose a novel approach that introduces a dedicated multi-modal defect localization module to decouple the dialog functionality from the core feature extraction.
We also contribute to the first multi-modal industrial anomaly detection training dataset, named Defect Detection Question Answering (DDQA)
arXiv Detail & Related papers (2025-03-18T11:33:29Z) - VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection [19.79027968793026]
Zero-shot anomaly detection (ZSAD) recognizes and localizes anomalies in previously unseen objects.
Existing ZSAD methods are limited by closed-world settings, struggling to unseen defects with predefined prompts.
We propose a novel framework VMAD (Visual-enhanced MLLM Anomaly Detection) that enhances MLLM with visual-based IAD knowledge and fine-grained perception.
arXiv Detail & Related papers (2024-09-30T09:51:29Z) - Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark [101.23684938489413]
Anomaly detection (AD) is often focused on detecting anomalies for industrial quality inspection and medical lesion examination.
This work first constructs a large-scale and general-purpose COCO-AD dataset by extending COCO to the AD field.
Inspired by the metrics in the segmentation field, we propose several more practical threshold-dependent AD-specific metrics.
arXiv Detail & Related papers (2024-04-16T17:38:26Z) - Anomaly Detection by Adapting a pre-trained Vision Language Model [48.225404732089515]
We present a unified framework named CLIP-ADA for Anomaly Detection by Adapting a pre-trained CLIP model.
We introduce the learnable prompt and propose to associate it with abnormal patterns through self-supervised learning.
We achieve the state-of-the-art 97.5/55.6 and 89.3/33.1 on MVTec-AD and VisA for anomaly detection and localization.
arXiv Detail & Related papers (2024-03-14T15:35:07Z) - Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets.
We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z) - SCL-VI: Self-supervised Context Learning for Visual Inspection of
Industrial Defects [4.487908181569429]
We present a novel self-supervised learning algorithm designed to derive an optimal encoder by tackling the renowned jigsaw puzzle.
Our approach involves dividing the target image into nine patches, tasking the encoder with predicting the relative position relationships between any two patches to extract rich semantics.
arXiv Detail & Related papers (2023-11-11T08:01:40Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Anomaly Detection Based on Selection and Weighting in Latent Space [73.01328671569759]
We propose a novel selection-and-weighting-based anomaly detection framework called SWAD.
Experiments on both benchmark and real-world datasets have shown the effectiveness and superiority of SWAD.
arXiv Detail & Related papers (2021-03-08T10:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.