LAD-Reasoner: Tiny Multimodal Models are Good Reasoners for Logical Anomaly Detection
- URL: http://arxiv.org/abs/2504.12749v1
- Date: Thu, 17 Apr 2025 08:41:23 GMT
- Title: LAD-Reasoner: Tiny Multimodal Models are Good Reasoners for Logical Anomaly Detection
- Authors: Weijia Li, Guanglei Chu, Jiong Chen, Guo-Sen Xie, Caifeng Shan, Fang Zhao,
- Abstract summary: We introduce Reasoning Logical Anomaly Detection (RLAD), which extends traditional anomaly detection by incorporating logical reasoning.<n>We propose a new framework, LAD-Reasoner, a customized tiny multimodal language model built on Qwen2.5-VL 3B.<n> Experiments on the MVTec LOCO AD dataset show that LAD-Reasoner, though significantly smaller, matches the performance of Qwen2.5-VL-72B in accuracy and F1 score.
- Score: 27.45348890285863
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in industrial anomaly detection have highlighted the need for deeper logical anomaly analysis, where unexpected relationships among objects, counts, and spatial configurations must be identified and explained. Existing approaches often rely on large-scale external reasoning modules or elaborate pipeline designs, hindering practical deployment and interpretability. To address these limitations, we introduce a new task, Reasoning Logical Anomaly Detection (RLAD), which extends traditional anomaly detection by incorporating logical reasoning. We propose a new framework, LAD-Reasoner, a customized tiny multimodal language model built on Qwen2.5-VL 3B. Our approach leverages a two-stage training paradigm that first employs Supervised Fine-Tuning (SFT) for fine-grained visual understanding, followed by Group Relative Policy Optimization (GRPO) to refine logical anomaly detection and enforce coherent, human-readable reasoning. Crucially, reward signals are derived from both the detection accuracy and the structural quality of the outputs, obviating the need for building chain of thought (CoT) reasoning data. Experiments on the MVTec LOCO AD dataset show that LAD-Reasoner, though significantly smaller, matches the performance of Qwen2.5-VL-72B in accuracy and F1 score, and further excels in producing concise and interpretable rationales. This unified design reduces reliance on large models and complex pipelines, while offering transparent and interpretable insights into logical anomaly detection. Code and data will be released.
Related papers
- Reasoning-Driven Multimodal LLM for Domain Generalization [72.00754603114187]
We study the role of reasoning in domain generalization using DomainBed-Reasoning dataset.<n>We propose RD-MLDG, a framework with two components: MTCT (Multi-Task Cross-Training) and SARR (Self-Aligned Reasoning Regularization)<n>Experiments on standard DomainBed datasets demonstrate that RD-MLDG achieves complementary state-of-the-art performances.
arXiv Detail & Related papers (2026-02-27T08:10:06Z) - Reason-IAD: Knowledge-Guided Dynamic Latent Reasoning for Explainable Industrial Anomaly Detection [85.29900916231655]
Reason-IAD is a knowledge-guided dynamic latent reasoning framework for explainable industrial anomaly detection.<n>Experiments demonstrate that Reason-IAD consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2026-02-10T14:54:17Z) - Interpretable Logical Anomaly Classification via Constraint Decomposition and Instruction Fine-Tuning [0.17722218114340835]
We introduce Logical Anomaly Classification (LAC), a task that unifies anomaly detection and fine-grained violation classification in a single inference step.<n>To tackle LAC, we propose LogiCls, a vision-language framework that decomposes complex logical constraints into a sequence of verifiable subqueries.
arXiv Detail & Related papers (2026-02-03T13:48:09Z) - Advancing Adaptive Multi-Stage Video Anomaly Reasoning: A Benchmark Dataset and Method [96.63801368613177]
We present a new task that elevates video anomaly analysis from descriptive understanding to structured, multi-stage reasoning.<n>We present a new dataset with 8,641 videos, totaling more than 50,000 samples, making it one of the largest datasets for video anomaly understanding.<n>Building upon the proposed task and dataset, we develop an end-to-end MLLM-based VAR model termed Vad-R1-Plus, which supports adaptive hierarchical reasoning and risk-aware decision making.
arXiv Detail & Related papers (2026-01-15T08:09:04Z) - Cascading multi-agent anomaly detection in surveillance systems via vision-language models and embedding-based classification [0.0]
This work introduces a cascading multi-agent framework that unifies complementary paradigms into a coherent and interpretable architecture.<n>Early modules perform reconstruction-gated filtering and object-level assessment, while higher-level reasoning agents are selectively invoked to interpret semantically ambiguous events.<n>The framework advances beyond conventional detection pipelines by combining early-exit efficiency, adaptive multi-agent reasoning, and explainable anomaly attribution, establishing a reproducible and energy-efficient foundation for scalable intelligent visual monitoring.
arXiv Detail & Related papers (2026-01-08T11:31:47Z) - Adversarial Yet Cooperative: Multi-Perspective Reasoning in Retrieved-Augmented Language Models [72.4149653187766]
We propose a Reasoner-Verifier framework named Adrialversa Reasoning RAG (ARR)<n>The Reasoner and Verifier engage in reasoning on retrieved evidence and critiquing each other's logic while being guided by process-aware advantage.<n> Experiments on multiple benchmarks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2026-01-08T06:57:03Z) - VAR: Visual Attention Reasoning via Structured Search and Backtracking [49.427842994857635]
We introduce Visual Attention Reasoning, a framework that recasts grounded reasoning as a structured search.<n> VAR decomposes the reasoning process into two key stages: traceable evidence grounding and search-based chain-of-thought.<n>We show that our 7B model, VAR-7B, sets a new state-of-the-art on a comprehensive suite of hallucination and safety benchmarks.
arXiv Detail & Related papers (2025-10-21T13:18:44Z) - RationAnomaly: Log Anomaly Detection with Rationality via Chain-of-Thought and Reinforcement Learning [27.235259453535537]
RationAnomaly is a novel framework that enhances log anomaly detection by synergizing Chain-of-Thought fine-tuning with reinforcement learning.<n>We have released the corresponding resources, including code and datasets.
arXiv Detail & Related papers (2025-09-18T07:35:58Z) - SAGE: A Visual Language Model for Anomaly Detection via Fact Enhancement and Entropy-aware Alignment [12.388954043805235]
Vision-Language Models (VLMs) often struggle in industrial anomaly detection and reasoning.<n>SAGE is a VLM-based framework that enhances anomaly reasoning through Self-Guided Fact Enhancement (SFE) and Entropy-aware Direct Preference Optimization (E-DPO)<n>SAGE demonstrates superior performance on industrial anomaly datasets under zero-shot and one-shot settings.
arXiv Detail & Related papers (2025-07-10T17:23:42Z) - OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning [76.90511414963265]
We introduce OmniAD, a framework that unifies anomaly detection and understanding for fine-grained analysis.<n>Visual reasoning provides detailed inspection by leveraging Text-as-Mask.<n>Visual Guided Textual Reasoning conducts comprehensive analysis by integrating visual perception.
arXiv Detail & Related papers (2025-05-28T07:02:15Z) - Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 [53.894789613838654]
We introduce SEED-Bench-R1, a benchmark designed to evaluate post-training methods for MLLMs in video understanding.<n>It includes intricate real-world videos and complex everyday planning tasks in the format of multiple-choice questions.<n>Using Qwen2-VL-Instruct-7B as a base model, we compare RL with supervised fine-tuning (SFT)<n>Our detailed analysis reveals that RL enhances visual perception but often produces less coherent reasoning chains.
arXiv Detail & Related papers (2025-03-31T17:55:23Z) - Towards Training-free Anomaly Detection with Vision and Language Foundation Models [17.991678161890174]
Anomaly detection is valuable for real-world applications, such as industrial quality inspection.<n>We introduce LogSAD, a novel multi-modal framework that requires no training for both Logical and Structural Anomaly Detection.
arXiv Detail & Related papers (2025-03-24T04:07:59Z) - Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models [54.04678363287392]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks.
Recent advancements in OpenAI o1 and DeepSeek-R1 have further improved performance in System-2 reasoning domains.
arXiv Detail & Related papers (2025-03-20T17:59:38Z) - Unveiling Reasoning Thresholds in Language Models: Scaling, Fine-Tuning, and Interpretability through Attention Maps [3.8936716676293917]
This study investigates the in-context learning capabilities of various decoder-only transformer-based language models with different model sizes and training data.<n>We identify a critical parameter threshold (1.6 billion), beyond which reasoning performance improves significantly in tasks such as commonsense reasoning in multiple-choice question answering and deductive reasoning.
arXiv Detail & Related papers (2025-02-21T00:48:32Z) - From Objects to Events: Unlocking Complex Visual Understanding in Object Detectors via LLM-guided Symbolic Reasoning [71.41062111470414]
The proposed plug-and-play framework interfaces with any open-vocabulary detector.<n>At its core, our approach combines (i) a symbolic regression mechanism exploring relationship patterns among detected entities.<n>We compared our training-free framework against specialized event recognition systems across diverse application domains.
arXiv Detail & Related papers (2025-02-09T10:30:54Z) - LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction [4.959108380494595]
Autoregressive, multimodal Vision Language Models (AVLMs) offer a promising alternative due to their exceptional performance in visual reasoning.<n>In this work, we investigate using AVLMs for logical anomaly detection and demonstrate that they are well-suited to the task.<n>We achieve SOTA performance on public benchmarks, MVTec LOCO AD, with an AUROC of 86.4% and F1-max of 83.7%, along with explanations of anomalies.
arXiv Detail & Related papers (2025-01-03T11:40:41Z) - Revisiting Deep Feature Reconstruction for Logical and Structural Industrial Anomaly Detection [2.3020018305241337]
Industrial anomaly detection is crucial for quality control and predictive maintenance.
Existing methods commonly detect structural anomalies, such as dents and scratches, by leveraging multi-scale features from image patches extracted through deep pre-trained networks.
We address these limitations by focusing on Deep Feature Reconstruction (DFR), a memory- and compute-efficient approach for detecting structural anomalies.
We further enhance DFR into a unified framework, called ULSAD, which is capable of detecting both structural and logical anomalies.
arXiv Detail & Related papers (2024-10-21T17:56:47Z) - MLAD: A Unified Model for Multi-system Log Anomaly Detection [35.68387377240593]
We propose MLAD, a novel anomaly detection model that incorporates semantic relational reasoning across multiple systems.
Specifically, we employ Sentence-bert to capture the similarities between log sequences and convert them into highly-dimensional learnable semantic vectors.
We revamp the formulas of the Attention layer to discern the significance of each keyword in the sequence and model the overall distribution of the multi-system dataset.
arXiv Detail & Related papers (2024-01-15T12:51:13Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Finding Alignments Between Interpretable Causal Variables and
Distributed Neural Representations [62.65877150123775]
Causal abstraction is a promising theoretical framework for explainable artificial intelligence.
Existing causal abstraction methods require a brute-force search over alignments between the high-level model and the low-level one.
We present distributed alignment search (DAS), which overcomes these limitations.
arXiv Detail & Related papers (2023-03-05T00:57:49Z) - ESAD: End-to-end Deep Semi-supervised Anomaly Detection [85.81138474858197]
We propose a new objective function that measures the KL-divergence between normal and anomalous data.
The proposed method significantly outperforms several state-of-the-arts on multiple benchmark datasets.
arXiv Detail & Related papers (2020-12-09T08:16:35Z) - An Information Bottleneck Approach for Controlling Conciseness in
Rationale Extraction [84.49035467829819]
We show that it is possible to better manage this trade-off by optimizing a bound on the Information Bottleneck (IB) objective.
Our fully unsupervised approach jointly learns an explainer that predicts sparse binary masks over sentences, and an end-task predictor that considers only the extracted rationale.
arXiv Detail & Related papers (2020-05-01T23:26:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.