Related papers: MAU-GPT: Enhancing Multi-type Industrial Anomaly Understanding via Anomaly-aware and Generalist Experts Adaptation

MAU-GPT: Enhancing Multi-type Industrial Anomaly Understanding via Anomaly-aware and Generalist Experts Adaptation

URL: http://arxiv.org/abs/2602.07011v1
Date: Sat, 31 Jan 2026 05:36:49 GMT
Title: MAU-GPT: Enhancing Multi-type Industrial Anomaly Understanding via Anomaly-aware and Generalist Experts Adaptation
Authors: Zhuonan Wang, Zhenxuan Fan, Siwen Tan, Yu Zhong, Yuqian Yuan, Haoyuan Li, Hao Jiang, Wenqiao Zhang, Feifei Shao, Hongwei Wang, Jun Xiao,
Abstract summary: We introduce MAU-Set, a comprehensive dataset for Multi-type industrial Anomaly Understanding.<n>We then present MAU-GPT, a domain-adapted multimodal large model specifically designed for industrial anomaly understanding.<n>It incorporates a novel AMoE-LoRA mechanism that unifies anomaly-aware and generalist experts adaptation, enhancing both understanding and reasoning across diverse defect classes.
Score: 31.60185302007424
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As industrial manufacturing scales, automating fine-grained product image analysis has become critical for quality control. However, existing approaches are hindered by limited dataset coverage and poor model generalization across diverse and complex anomaly patterns. To address these challenges, we introduce MAU-Set, a comprehensive dataset for Multi-type industrial Anomaly Understanding. It spans multiple industrial domains and features a hierarchical task structure, ranging from binary classification to complex reasoning. Alongside this dataset, we establish a rigorous evaluation protocol to facilitate fair and comprehensive model assessment. Building upon this foundation, we further present MAU-GPT, a domain-adapted multimodal large model specifically designed for industrial anomaly understanding. It incorporates a novel AMoE-LoRA mechanism that unifies anomaly-aware and generalist experts adaptation, enhancing both understanding and reasoning across diverse defect classes. Extensive experiments show that MAU-GPT consistently outperforms prior state-of-the-art methods across all domains, demonstrating strong potential for scalable and automated industrial inspection.

Related papers

Reasoning-Driven Multimodal LLM for Domain Generalization [72.00754603114187]
We study the role of reasoning in domain generalization using DomainBed-Reasoning dataset.<n>We propose RD-MLDG, a framework with two components: MTCT (Multi-Task Cross-Training) and SARR (Self-Aligned Reasoning Regularization)<n>Experiments on standard DomainBed datasets demonstrate that RD-MLDG achieves complementary state-of-the-art performances.
arXiv Detail & Related papers (2026-02-27T08:10:06Z)
OFA-MAS: One-for-All Multi-Agent System Topology Design based on Mixture-of-Experts Graph Generative Models [57.94189874119267]
Multi-Agent Systems (MAS) offer a powerful paradigm for solving complex problems.<n>Current graph learning-based design methodologies often adhere to a "one-for-one" paradigm.<n>We propose OFA-TAD, a one-for-all framework that generates adaptive collaboration graphs for any task described in natural language.
arXiv Detail & Related papers (2026-01-19T12:23:44Z)
SERM: Self-Evolving Relevance Model with Agent-Driven Learning from Massive Query Streams [53.78257200138774]
We propose a Self-Evolving Relevance Model approach (SERM), which comprises two complementary multi-agent modules.<n>We evaluate SERM in a large-scale industrial setting, which serves billions of user requests daily.
arXiv Detail & Related papers (2026-01-14T14:31:16Z)
Real-IAD Variety: Pushing Industrial Anomaly Detection Dataset to a Modern Era [110.83702639978469]
Real-IAD Variety is the largest and most diverse IAD benchmark, comprising 198,960 high-resolution images across 160 distinct object categories.<n>Its diversity is ensured through comprehensive coverage of 28 industries, 24 material types, and 22 color variations.<n>Real-IAD Variety will be made publicly available to facilitate innovation in this critical field.
arXiv Detail & Related papers (2025-11-01T12:58:02Z)
AnomalyMoE: Towards a Language-free Generalist Model for Unified Visual Anomaly Detection [29.06542941993374]
AnomalyMoE is a novel and universal anomaly detection framework based on a Mixture-of-Experts architecture.<n>Our key insight is to decompose the complex anomaly detection problem into three distinct semantic hierarchies.<n>AnomalyMoE employs three dedicated expert networks at the patch, component, and global levels.
arXiv Detail & Related papers (2025-08-08T10:33:18Z)
AD-FM: Multimodal LLMs for Anomaly Detection via Multi-Stage Reasoning and Fine-Grained Reward Optimization [43.86757207244911]
We propose a comprehensive framework addressing limitations through two synergistic innovations.<n>First, we introduce a multi-stage deliberative reasoning process that guides models from region identification to focused examination.<n>Second, we develop a fine-grained reward mechanism incorporating classification accuracy and localization supervision.
arXiv Detail & Related papers (2025-08-06T08:00:27Z)
Generate Aligned Anomaly: Region-Guided Few-Shot Anomaly Image-Mask Pair Synthesis for Industrial Inspection [53.137651284042434]
Anomaly inspection plays a vital role in industrial manufacturing, but the scarcity of anomaly samples limits the effectiveness of existing methods.<n>We propose Generate grained Anomaly (GAA), a region-guided, few-shot anomaly image-mask pair generation framework.<n>GAA generates realistic, diverse, and semantically aligned anomalies using only a small number of samples.
arXiv Detail & Related papers (2025-07-13T12:56:59Z)
SAGE: A Visual Language Model for Anomaly Detection via Fact Enhancement and Entropy-aware Alignment [12.388954043805235]
Vision-Language Models (VLMs) often struggle in industrial anomaly detection and reasoning.<n>SAGE is a VLM-based framework that enhances anomaly reasoning through Self-Guided Fact Enhancement (SFE) and Entropy-aware Direct Preference Optimization (E-DPO)<n>SAGE demonstrates superior performance on industrial anomaly datasets under zero-shot and one-shot settings.
arXiv Detail & Related papers (2025-07-10T17:23:42Z)
Anomaly Detection and Generation with Diffusion Models: A Survey [51.61574868316922]
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing.<n>Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest.<n>This survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
arXiv Detail & Related papers (2025-06-11T03:29:18Z)
AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection [40.34270276536052]
Industrial Anomaly Detection (IAD) poses a formidable challenge due to the scarcity of defective samples.<n>Traditional approaches, often constrained by hand-crafted features or domain-specific expert models, struggle to address this limitation.<n>We introduce AnomalyR1, a pioneering framework that leverages VLM-R1, a Multimodal Large Language Model (MLLM) renowned for its exceptional generalization and interpretability.
arXiv Detail & Related papers (2025-04-16T09:48:41Z)
Can Multimodal Large Language Models be Guided to Improve Industrial Anomaly Detection? [5.979778557940213]
Traditional industrial anomaly detection models often struggle with flexibility and adaptability.<n>Recent advancements in Multimodal Large Language Models (MLLMs) hold promise for overcoming these limitations.<n>We propose Echo, a novel multi-expert framework designed to enhance MLLM performance for IAD.
arXiv Detail & Related papers (2025-01-27T05:41:10Z)
Myriad: Large Multimodal Model by Applying Vision Experts for Industrial Anomaly Detection [86.24898024621008]
We present a novel large multimodal model applying vision experts for industrial anomaly detection(abbreviated to Myriad)<n>We utilize the anomaly map generated by the vision experts as guidance for LMMs, such that the vision model is guided to pay more attention to anomalous regions.<n>Our proposed method not only performs favorably against state-of-the-art methods, but also inherits the flexibility and instruction-following ability of LMMs in the field of IAD.
arXiv Detail & Related papers (2023-10-29T16:49:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.