Related papers: IAD-R1: Reinforcing Consistent Reasoning in Industrial Anomaly Detection

IAD-R1: Reinforcing Consistent Reasoning in Industrial Anomaly Detection

URL: http://arxiv.org/abs/2508.09178v2
Date: Thu, 14 Aug 2025 15:30:10 GMT
Title: IAD-R1: Reinforcing Consistent Reasoning in Industrial Anomaly Detection
Authors: Yanhui Li, Yunkang Cao, Chengliang Liu, Yuan Xiong, Xinghui Dong, Chao Huang,
Abstract summary: IAD-R1, a universal post-training framework, substantially enhances anomaly detection capabilities.<n>IAD-R1 achieves significant improvements across 7 Vision-Language Models (VLMs)<n>IAD-R1 surpasses commercial models including GPT-4.1 and Claude-Sonnet-4 in zero-shot settings.
Score: 11.178131621535261
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Industrial anomaly detection is a critical component of modern manufacturing, yet the scarcity of defective samples restricts traditional detection methods to scenario-specific applications. Although Vision-Language Models (VLMs) demonstrate significant advantages in generalization capabilities, their performance in industrial anomaly detection remains limited. To address this challenge, we propose IAD-R1, a universal post-training framework applicable to VLMs of different architectures and parameter scales, which substantially enhances their anomaly detection capabilities. IAD-R1 employs a two-stage training strategy: the Perception Activation Supervised Fine-Tuning (PA-SFT) stage utilizes a meticulously constructed high-quality Chain-of-Thought dataset (Expert-AD) for training, enhancing anomaly perception capabilities and establishing reasoning-to-answer correlations; the Structured Control Group Relative Policy Optimization (SC-GRPO) stage employs carefully designed reward functions to achieve a capability leap from "Anomaly Perception" to "Anomaly Interpretation". Experimental results demonstrate that IAD-R1 achieves significant improvements across 7 VLMs, the largest improvement was on the DAGM dataset, with average accuracy 43.3% higher than the 0.5B baseline. Notably, the 0.5B parameter model trained with IAD-R1 surpasses commercial models including GPT-4.1 and Claude-Sonnet-4 in zero-shot settings, demonstrating the effectiveness and superiority of IAD-R1. The dataset, code, and all model weights will be publicly available at https://github.com/Yanhui-Lee/IAD-R1.

Related papers

STAR : Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction [78.0692157478247]
We propose STAR, a framework that bridges data-driven STatistical expectations with knowledge-driven Agentic Reasoning.<n>We show that STAR consistently outperforms all baselines on both score-based and rank-based metrics.
arXiv Detail & Related papers (2026-02-12T16:30:07Z)
AWPO: Enhancing Tool-Use of Large Language Models through Explicit Integration of Reasoning Rewards [60.2998874976509]
We propose advantage-weighted policy optimization (AWPO) to integrate explicit reasoning rewards to enhance tool-use capability.<n>AWPO incorporates variance-aware gating and difficulty-aware weighting to adaptively modulate advantages from reasoning signals.<n>Experiments demonstrate that AWPO achieves state-of-the-art performance across standard tool-use benchmarks.
arXiv Detail & Related papers (2025-12-22T08:07:00Z)
DARN: Dynamic Adaptive Regularization Networks for Efficient and Robust Foundation Model Adaptation [0.0]
We introduce Dynamic Adaptive Regularization Networks (DARN)<n>DARN integrates three key innovations: a lightweight Task Complexity Predictor ( TCP) that estimates per-sample difficulty, Adaptive Dropout Modulation (ADM) and Dynamic Capacity Gating (DCG)<n>In full fine-tuning (unfrozen backbone), DARN achieves a new state-of-the-art on the multi-task GeoBench benchmark (86.66% mIoU, +5.56 pp over prior SOTA).<n>In efficient adaptation (frozen backbone), DARN achieves SOTA-competitive accuracy (90.5% mIoU on Sen
arXiv Detail & Related papers (2025-11-06T19:36:49Z)
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail [85.47497935739936]
Alpamayo-R1 (AR1) is a vision-language-action model that integrates Chain of Causation reasoning with trajectory planning.<n>We show AR1 achieves 12% improvement in planning accuracy on challenging cases compared to a trajectory-only baseline.<n>We plan to release AR1 models and a subset of the CoC in a future update.
arXiv Detail & Related papers (2025-10-30T01:25:34Z)
A Fuzzy Logic-Based Framework for Explainable Machine Learning in Big Data Analytics [0.0]
This paper presents a novel framework that combines type-2 fuzzy sets, granular computing, and clustering to boost explainability and fairness in big data environments.<n>When applied to the UCI Air Quality dataset, the framework effectively manages uncertainty in noisy sensor data, produces linguistic rules, and assesses fairness using silhouette scores and entropy.
arXiv Detail & Related papers (2025-09-29T18:02:31Z)
Perception-Aware Policy Optimization for Multimodal Reasoning [79.56070395437898]
A major source of error in current multimodal reasoning lies in the perception of visual inputs.<n>We propose PAPO, a novel policy gradient algorithm that encourages the model to learn to perceive while learning to reason.<n>We observe a substantial reduction of 30.5% in perception errors, indicating improved perceptual capabilities with PAPO.
arXiv Detail & Related papers (2025-07-08T23:22:34Z)
SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents [58.174206358223415]
Self-Evolving Embodied Agents-R1, or SEEA-R1, is the first reinforcement fine-tuning framework designed for self-evolving embodied agents.<n>We show that SEEA-R1 can support autonomous adaptation and reward-driven self-evolution.
arXiv Detail & Related papers (2025-06-26T18:00:07Z)
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization [55.06360285372418]
Group Relative Policy Optimization is a reinforcement learning method for large reasoning models (LRMs)<n>In this work, we analyze the GRPO objective under a binary reward setting and reveal an inherent limitation of question-level difficulty bias.<n>We introduce a new Discriminative Constrained Optimization framework for reinforcing LRMs, grounded in the principle of discriminative learning.
arXiv Detail & Related papers (2025-05-18T11:08:32Z)
AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection [40.34270276536052]
Industrial Anomaly Detection (IAD) poses a formidable challenge due to the scarcity of defective samples.<n>Traditional approaches, often constrained by hand-crafted features or domain-specific expert models, struggle to address this limitation.<n>We introduce AnomalyR1, a pioneering framework that leverages VLM-R1, a Multimodal Large Language Model (MLLM) renowned for its exceptional generalization and interpretability.
arXiv Detail & Related papers (2025-04-16T09:48:41Z)
Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning [13.642017219922238]
Rec-R1 bridges large language models (LLMs) with recommendation systems through closed-loop optimization.<n>Unlike prompting and supervised fine-tuning (SFT), Rec-R1 directly optimize LLM generation using feedback from a fixed black-box recommendation model.
arXiv Detail & Related papers (2025-03-31T16:36:00Z)
EIAD: Explainable Industrial Anomaly Detection Via Multi-Modal Large Language Models [23.898938659720503]
Industrial Anomaly Detection (IAD) is critical to ensure product quality during manufacturing.<n>We propose a novel approach that introduces a dedicated multi-modal defect localization module to decouple the dialog functionality from the core feature extraction.<n>We also contribute to the first multi-modal industrial anomaly detection training dataset, named Defect Detection Question Answering (DDQA)
arXiv Detail & Related papers (2025-03-18T11:33:29Z)
RAAD-LLM: Adaptive Anomaly Detection Using LLMs and RAG Integration [2.879328762187361]
We present RAAD-LLM, a novel framework for adaptive anomaly detection.<n>By effectively utilizing domain-specific knowledge, RAAD-LLM enhances the detection of anomalies in time series data.<n>Results show significant improvements over our previous model with an accuracy increase from 70.7% to 88.6% on the real-world dataset.
arXiv Detail & Related papers (2025-03-04T17:20:43Z)
Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets. We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z)
Prediction of SLAM ATE Using an Ensemble Learning Regression Model and 1-D Global Pooling of Data Characterization [3.4399698738841553]
We introduce a novel method for predicting SLAM localization error based on the characterization of raw sensor inputs. The proposed method relies on using a random forest regression model trained on 1-D global pooled features that are generated from characterized raw sensor data. The paper also studies the impact of 12 different 1-D global pooling functions on regression quality, and the superiority of 1-D global averaging is quantitatively proven.
arXiv Detail & Related papers (2023-03-01T16:12:47Z)
Adversarially Adaptive Normalization for Single Domain Generalization [71.80587939738672]
We propose a generic normalization approach, adaptive standardization and rescaling normalization (ASR-Norm) ASR-Norm learns both the standardization and rescaling statistics via neural networks. We show that ASR-Norm can bring consistent improvement to the state-of-the-art ADA approaches.
arXiv Detail & Related papers (2021-06-01T23:58:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.