Fugu-MT 論文翻訳(概要): MMR-AD: A Large-Scale Multimodal Dataset for Benchmarking General Anomaly Detection with Multimodal Large Language Models

論文の概要: MMR-AD: A Large-Scale Multimodal Dataset for Benchmarking General Anomaly Detection with Multimodal Large Language Models

arxiv url: http://arxiv.org/abs/2604.10971v1
Date: Mon, 13 Apr 2026 04:14:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:16.314808
Title: MMR-AD: A Large-Scale Multimodal Dataset for Benchmarking General Anomaly Detection with Multimodal Large Language Models
Title（参考訳）: MMR-AD:マルチモーダル大言語モデルを用いた一般異常検出ベンチマークのための大規模マルチモーダルデータセット
Authors: Xincheng Yao, Zefeng Qian, Chao Shi, Jiayang Song, Chongyang Zhang,
Abstract要約: MLLMベースのADモデルのトレーニングと評価のベンチマークであるMMR-ADを提案する。また,CoTデータから学習する推論に基づくADモデルであるAnomaly-R1を提案する。我々のAnomaly-R1は、異常検出と局所化の両方において、ジェネラリストMLLMよりも顕著に改善されていることを示す。
参考スコア（独自算出の注目度）: 16.60737807862461
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the progress of industrial anomaly detection, general anomaly detection (GAD) is an emerging trend and also the ultimate goal. Unlike the conventional single- and multi-class AD, general AD aims to train a general AD model that can directly detect anomalies in diverse novel classes without any retraining or fine-tuning on the target data. Recently, Multimodal Large Language Models (MLLMs) have shown great promise in achieving general anomaly detection due to their revolutionary visual understanding and language reasoning capabilities. However, MLLM's general AD ability remains underexplored due to: (1) MLLMs are pretrained on amounts of data sourced from the Web, these data still have significant gaps with the data in AD scenarios. Moreover, the image-text pairs during pretraining are also not specifically for AD tasks. (2) The current mainstream AD datasets are image-based and not yet suitable for post-training MLLMs. To facilitate MLLM-based general AD research, we present MMR-AD, which is a comprehensive benchmark for both training and evaluating MLLM-based AD models. With MMR-AD, we reveal that the AD performance of current SOTA generalist MLLMs still falls far behind the industrial requirements. Based on MMR-AD, we also propose a baseline model, Anomaly-R1, which is a reasoning-based AD model that learns from the CoT data in MMR-AD and is further enhanced by reinforcement learning. Extensive experiments show that our Anomaly-R1 achieves remarkable improvements over generalist MLLMs in both anomaly detection and localization.
Abstract（参考訳）: 産業的異常検出の進展において、一般異常検出(GAD)は新たなトレンドであり、最終目標でもある。従来の単クラスおよび多クラスADとは異なり、一般ADは、ターゲットデータに再トレーニングや微調整を加えることなく、様々な新しいクラスの異常を直接検出できる一般的なADモデルを訓練することを目的としている。近年,Multimodal Large Language Models (MLLM) は,その革命的な視覚的理解と言語推論能力により,一般的な異常検出を実現する上で大きな可能性を示している。しかし、MLLMの一般AD能力は、(1)MLLMはWebから得られたデータ量に基づいて事前訓練されており、これらのデータはADシナリオにおけるデータと大きなギャップを持つ。さらに、事前トレーニング中の画像とテキストのペアは、ADタスクに特化していない。 2) 現在の主流ADデータセットは画像ベースであり,後処理MLLMにはまだ適していない。 MLLMに基づく一般AD研究を容易にするため,MLLMベースのADモデルのトレーニングと評価のための総合ベンチマークであるMMR-ADを提案する。 MMR-ADでは、現在のSOTAジェネラリストMLLMのAD性能が産業的要件よりはるかに遅れていることが明らかとなった。 MMR-ADに基づいて,MMR-ADにおけるCoTデータから学習し,強化学習によりさらに強化された推論に基づくADモデルであるAnomaly-R1を提案する。 Anomaly-R1は、異常検出と局所化の両方において、ジェネラリストMLLMよりも顕著な改善を達成している。

論文の概要: MMR-AD: A Large-Scale Multimodal Dataset for Benchmarking General Anomaly Detection with Multimodal Large Language Models

関連論文リスト