LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction
- URL: http://arxiv.org/abs/2501.01767v2
- Date: Wed, 08 Jan 2025 12:11:18 GMT
- Title: LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction
- Authors: Er Jin, Qihui Feng, Yongli Mou, Stefan Decker, Gerhard Lakemeyer, Oliver Simons, Johannes Stegmaier,
- Abstract summary: Autoregressive, multimodal Vision Language Models (AVLMs) offer a promising alternative due to their exceptional performance in visual reasoning.<n>In this work, we investigate using AVLMs for logical anomaly detection and demonstrate that they are well-suited to the task.<n>We achieve SOTA performance on public benchmarks, MVTec LOCO AD, with an AUROC of 86.4% and F1-max of 83.7%, along with explanations of anomalies.
- Score: 4.959108380494595
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Logical image understanding involves interpreting and reasoning about the relationships and consistency within an image's visual content. This capability is essential in applications such as industrial inspection, where logical anomaly detection is critical for maintaining high-quality standards and minimizing costly recalls. Previous research in anomaly detection (AD) has relied on prior knowledge for designing algorithms, which often requires extensive manual annotations, significant computing power, and large amounts of data for training. Autoregressive, multimodal Vision Language Models (AVLMs) offer a promising alternative due to their exceptional performance in visual reasoning across various domains. Despite this, their application to logical AD remains unexplored. In this work, we investigate using AVLMs for logical AD and demonstrate that they are well-suited to the task. Combining AVLMs with format embedding and a logic reasoner, we achieve SOTA performance on public benchmarks, MVTec LOCO AD, with an AUROC of 86.0% and F1-max of 83.7%, along with explanations of anomalies. This significantly outperforms the existing SOTA method by a large margin.
Related papers
- LAD-Reasoner: Tiny Multimodal Models are Good Reasoners for Logical Anomaly Detection [27.45348890285863]
We introduce Reasoning Logical Anomaly Detection (RLAD), which extends traditional anomaly detection by incorporating logical reasoning.
We propose a new framework, LAD-Reasoner, a customized tiny multimodal language model built on Qwen2.5-VL 3B.
Experiments on the MVTec LOCO AD dataset show that LAD-Reasoner, though significantly smaller, matches the performance of Qwen2.5-VL-72B in accuracy and F1 score.
arXiv Detail & Related papers (2025-04-17T08:41:23Z) - LogicQA: Logical Anomaly Detection with Vision Language Model Generated Questions [4.63822109539229]
We introduce LogicQA, a framework that enhances Anomaly Detection (AD)
LogicQA compiles automatically generated questions into a checklist and collects responses to identify violations of logical constraints.
We achieve state-of-the-art (SOTA) Logical AD performance on public benchmarks, MVTec LOCO AD, with an AUROC of 87.6 percent and an F1-max of 87.0 percent along with the explanations of anomalies.
arXiv Detail & Related papers (2025-03-26T05:38:45Z) - VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning [63.0285363282581]
Multimodal Large Language Models (MLLMs) have become a powerful tool for integrating visual and textual information.
We introduce VOILA, a benchmark designed to evaluate MLLMs' perceptual understanding and abstract relational reasoning.
We reveal that current MLLMs struggle to comprehend inter-image relationships and exhibit limited capabilities in high-level relational reasoning.
arXiv Detail & Related papers (2025-02-25T23:36:19Z) - Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models [29.078437003042357]
Zero-Shot Anomaly Detection (ZSAD) is an emerging AD paradigm.
We propose Anomaly-OneVision (Anomaly-OV), the first specialist visual assistant for ZSAD and reasoning.
arXiv Detail & Related papers (2025-02-11T14:50:43Z) - Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection [19.79027968793026]
Zero-shot anomaly detection (ZSAD) recognizes and localizes anomalies in previously unseen objects.
Existing ZSAD methods are limited by closed-world settings, struggling to unseen defects with predefined prompts.
We propose a novel framework VMAD (Visual-enhanced MLLM Anomaly Detection) that enhances MLLM with visual-based IAD knowledge and fine-grained perception.
arXiv Detail & Related papers (2024-09-30T09:51:29Z) - MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection [107.15164718585666]
We investigate the root cause of VLMs' biased prediction under the open vocabulary detection context.
Our observations lead to a simple yet effective paradigm, coded MarvelOVD, that generates significantly better training targets.
Our method outperforms the other state-of-the-arts by significant margins.
arXiv Detail & Related papers (2024-07-31T09:23:57Z) - Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark [101.23684938489413]
Anomaly detection (AD) is often focused on detecting anomalies for industrial quality inspection and medical lesion examination.
This work first constructs a large-scale and general-purpose COCO-AD dataset by extending COCO to the AD field.
Inspired by the metrics in the segmentation field, we propose several more practical threshold-dependent AD-specific metrics.
arXiv Detail & Related papers (2024-04-16T17:38:26Z) - Myriad: Large Multimodal Model by Applying Vision Experts for Industrial
Anomaly Detection [89.49244928440221]
We propose a novel large multi-modal model by applying vision experts for industrial anomaly detection (dubbed Myriad)
Specifically, we adopt MiniGPT-4 as the base LMM and design an Expert Perception module to embed the prior knowledge from vision experts as tokens which are intelligible to Large Language Models (LLMs)
To compensate for the errors and confusions of vision experts, we introduce a domain adapter to bridge the visual representation gaps between generic and industrial images.
arXiv Detail & Related papers (2023-10-29T16:49:45Z) - AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language
Models [30.723122000372538]
AnomalyGPT is a novel IAD approach based on Large Vision-Language Models (LVLM)
We generate training data by simulating anomalous images and producing corresponding textual descriptions for each image.
AnomalyGPT achieves the state-of-the-art performance with an accuracy of 86.1%, an image-level AUC of 94.1%, and a pixel-level AUC of 95.3% on the MVTec-AD dataset.
arXiv Detail & Related papers (2023-08-29T15:02:53Z) - IM-IAD: Industrial Image Anomaly Detection Benchmark in Manufacturing [88.35145788575348]
Image anomaly detection (IAD) is an emerging and vital computer vision task in industrial manufacturing.
The lack of a uniform IM benchmark is hindering the development and usage of IAD methods in real-world applications.
We construct a comprehensive image anomaly detection benchmark (IM-IAD), which includes 19 algorithms on seven major datasets.
arXiv Detail & Related papers (2023-01-31T01:24:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.