Fugu-MT 論文翻訳(概要): UniMMAD: Unified Multi-Modal and Multi-Class Anomaly Detection via MoE-Driven Feature Decompression

論文の概要: UniMMAD: Unified Multi-Modal and Multi-Class Anomaly Detection via MoE-Driven Feature Decompression

arxiv url: http://arxiv.org/abs/2509.25934v1
Date: Tue, 30 Sep 2025 08:29:12 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-01 14:45:00.067347
Title: UniMMAD: Unified Multi-Modal and Multi-Class Anomaly Detection via MoE-Driven Feature Decompression
Title（参考訳）: UniMMAD: MoE-Driven Feature Decompression によるマルチモーダル・マルチクラス異常検出
Authors: Yuan Zhao, Youwei Pang, Lihe Zhang, Hanqi Liu, Jiaming Zuo, Huchuan Lu, Xiaoqi Zhao,
Abstract要約: UniMMADは、マルチモーダルおよびマルチクラスの異常検出のための統一されたフレームワークである。 UniMMADは、9つの異常検出データセット上で、3つのフィールド、12のモダリティ、66のクラスにまたがる最先端のパフォーマンスを達成する。
参考スコア（独自算出の注目度）: 74.0893986012049
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing anomaly detection (AD) methods often treat the modality and class as independent factors. Although this paradigm has enriched the development of AD research branches and produced many specialized models, it has also led to fragmented solutions and excessive memory overhead. Moreover, reconstruction-based multi-class approaches typically rely on shared decoding paths, which struggle to handle large variations across domains, resulting in distorted normality boundaries, domain interference, and high false alarm rates. To address these limitations, we propose UniMMAD, a unified framework for multi-modal and multi-class anomaly detection. At the core of UniMMAD is a Mixture-of-Experts (MoE)-driven feature decompression mechanism, which enables adaptive and disentangled reconstruction tailored to specific domains. This process is guided by a ``general to specific'' paradigm. In the encoding stage, multi-modal inputs of varying combinations are compressed into compact, general-purpose features. The encoder incorporates a feature compression module to suppress latent anomalies, encourage cross-modal interaction, and avoid shortcut learning. In the decoding stage, the general features are decompressed into modality-specific and class-specific forms via a sparsely-gated cross MoE, which dynamically selects expert pathways based on input modality and class. To further improve efficiency, we design a grouped dynamic filtering mechanism and a MoE-in-MoE structure, reducing parameter usage by 75\% while maintaining sparse activation and fast inference. UniMMAD achieves state-of-the-art performance on 9 anomaly detection datasets, spanning 3 fields, 12 modalities, and 66 classes. The source code will be available at https://github.com/yuanzhao-CVLAB/UniMMAD.
Abstract（参考訳）: 既存の異常検出 (AD) 法は、しばしばモーダリティとクラスを独立した要因として扱う。このパラダイムはAD研究部門の発展を豊かにし、多くの専門モデルを生み出したが、断片化されたソリューションと過剰なメモリオーバーヘッドにつながった。さらに、再構成ベースのマルチクラスアプローチは、一般的に、ドメイン間の大きなバリエーションを扱うのに苦労する共有デコードパスに依存し、歪んだ正規性境界、ドメイン干渉、高い偽アラームレートをもたらす。これらの制約に対処するため,マルチモーダルおよびマルチクラス異常検出のための統合フレームワークUniMMADを提案する。 UniMMADのコアとなるのは、Mixture-of-Experts(MoE)駆動のフィーチャ圧縮機構で、特定のドメインに合わせて調整された適応的かつアンタングル化された再構築を可能にする。このプロセスは ``General to specific'' パラダイムで導かれる。符号化段階では、様々な組み合わせのマルチモーダル入力をコンパクトで汎用的な特徴に圧縮する。エンコーダには機能圧縮モジュールが組み込まれており、潜伏異常を抑え、モーダル間相互作用を奨励し、ショートカット学習を避ける。復号段階において、一般的な特徴は、入力モダリティとクラスに基づいて専門家経路を動的に選択するスパースゲートのクロスMoEを介して、モダリティ特化およびクラス特化形式に分解される。さらに効率を向上させるために,グループ化動的フィルタリング機構とMoE-in-MoE構造を設計し,疎活性化と高速推論を維持しながらパラメータ使用率を75%削減した。 UniMMADは、9つの異常検出データセット上で、3つのフィールド、12のモダリティ、66のクラスにまたがる最先端のパフォーマンスを達成する。ソースコードはhttps://github.com/yuanzhao-CVLAB/UniMMADで入手できる。

論文の概要: UniMMAD: Unified Multi-Modal and Multi-Class Anomaly Detection via MoE-Driven Feature Decompression

関連論文リスト