LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction
- URL: http://arxiv.org/abs/2501.01767v2
- Date: Wed, 08 Jan 2025 12:11:18 GMT
- Title: LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction
- Authors: Er Jin, Qihui Feng, Yongli Mou, Stefan Decker, Gerhard Lakemeyer, Oliver Simons, Johannes Stegmaier,
- Abstract summary: Autoregressive, multimodal Vision Language Models (AVLMs) offer a promising alternative due to their exceptional performance in visual reasoning.
In this work, we investigate using AVLMs for logical anomaly detection and demonstrate that they are well-suited to the task.
We achieve SOTA performance on public benchmarks, MVTec LOCO AD, with an AUROC of 86.4% and F1-max of 83.7%, along with explanations of anomalies.
- Score: 4.959108380494595
- License:
- Abstract: Logical image understanding involves interpreting and reasoning about the relationships and consistency within an image's visual content. This capability is essential in applications such as industrial inspection, where logical anomaly detection is critical for maintaining high-quality standards and minimizing costly recalls. Previous research in anomaly detection (AD) has relied on prior knowledge for designing algorithms, which often requires extensive manual annotations, significant computing power, and large amounts of data for training. Autoregressive, multimodal Vision Language Models (AVLMs) offer a promising alternative due to their exceptional performance in visual reasoning across various domains. Despite this, their application to logical AD remains unexplored. In this work, we investigate using AVLMs for logical AD and demonstrate that they are well-suited to the task. Combining AVLMs with format embedding and a logic reasoner, we achieve SOTA performance on public benchmarks, MVTec LOCO AD, with an AUROC of 86.0% and F1-max of 83.7%, along with explanations of anomalies. This significantly outperforms the existing SOTA method by a large margin.
Related papers
- Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models [29.078437003042357]
Zero-Shot Anomaly Detection (ZSAD) is an emerging AD paradigm.
We propose Anomaly-OneVision (Anomaly-OV), the first specialist visual assistant for ZSAD and reasoning.
arXiv Detail & Related papers (2025-02-11T14:50:43Z) - Can Multimodal Large Language Models be Guided to Improve Industrial Anomaly Detection? [5.979778557940213]
Traditional industrial anomaly detection models often struggle with flexibility and adaptability.
Recent advancements in Multimodal Large Language Models (MLLMs) hold promise for overcoming these limitations.
We propose Echo, a novel multi-expert framework designed to enhance MLLM performance for IAD.
arXiv Detail & Related papers (2025-01-27T05:41:10Z) - Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection [19.79027968793026]
Zero-shot anomaly detection (ZSAD) recognizes and localizes anomalies in previously unseen objects.
Existing ZSAD methods are limited by closed-world settings, struggling to unseen defects with predefined prompts.
We propose a novel framework VMAD (Visual-enhanced MLLM Anomaly Detection) that enhances MLLM with visual-based IAD knowledge and fine-grained perception.
arXiv Detail & Related papers (2024-09-30T09:51:29Z) - MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection [107.15164718585666]
We investigate the root cause of VLMs' biased prediction under the open vocabulary detection context.
Our observations lead to a simple yet effective paradigm, coded MarvelOVD, that generates significantly better training targets.
Our method outperforms the other state-of-the-arts by significant margins.
arXiv Detail & Related papers (2024-07-31T09:23:57Z) - Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark [101.23684938489413]
Anomaly detection (AD) is often focused on detecting anomalies for industrial quality inspection and medical lesion examination.
This work first constructs a large-scale and general-purpose COCO-AD dataset by extending COCO to the AD field.
Inspired by the metrics in the segmentation field, we propose several more practical threshold-dependent AD-specific metrics.
arXiv Detail & Related papers (2024-04-16T17:38:26Z) - AcME-AD: Accelerated Model Explanations for Anomaly Detection [5.702288833888639]
AcME-AD is a model-agnostic, efficient solution for interoperability.
It offers local feature importance scores and a what-if analysis tool, shedding light on the factors contributing to each anomaly.
This paper elucidates AcME-AD's foundation, its benefits over existing methods, and validates its effectiveness with tests on both synthetic and real datasets.
arXiv Detail & Related papers (2024-03-02T16:11:58Z) - Myriad: Large Multimodal Model by Applying Vision Experts for Industrial Anomaly Detection [86.24898024621008]
We present a novel large multimodal model applying vision experts for industrial anomaly detection(abbreviated to Myriad)
We utilize the anomaly map generated by the vision experts as guidance for LMMs, such that the vision model is guided to pay more attention to anomalous regions.
Our proposed method not only performs favorably against state-of-the-art methods, but also inherits the flexibility and instruction-following ability of LMMs in the field of IAD.
arXiv Detail & Related papers (2023-10-29T16:49:45Z) - AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language
Models [30.723122000372538]
AnomalyGPT is a novel IAD approach based on Large Vision-Language Models (LVLM)
We generate training data by simulating anomalous images and producing corresponding textual descriptions for each image.
AnomalyGPT achieves the state-of-the-art performance with an accuracy of 86.1%, an image-level AUC of 94.1%, and a pixel-level AUC of 95.3% on the MVTec-AD dataset.
arXiv Detail & Related papers (2023-08-29T15:02:53Z) - Building Interpretable and Reliable Open Information Retriever for New
Domains Overnight [67.03842581848299]
Information retrieval is a critical component for many down-stream tasks such as open-domain question answering (QA)
We propose an information retrieval pipeline that uses entity/event linking model and query decomposition model to focus more accurately on different information units of the query.
We show that, while being more interpretable and reliable, our proposed pipeline significantly improves passage coverages and denotation accuracies across five IR and QA benchmarks.
arXiv Detail & Related papers (2023-08-09T07:47:17Z) - IM-IAD: Industrial Image Anomaly Detection Benchmark in Manufacturing [88.35145788575348]
Image anomaly detection (IAD) is an emerging and vital computer vision task in industrial manufacturing.
The lack of a uniform IM benchmark is hindering the development and usage of IAD methods in real-world applications.
We construct a comprehensive image anomaly detection benchmark (IM-IAD), which includes 19 algorithms on seven major datasets.
arXiv Detail & Related papers (2023-01-31T01:24:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.