Fugu-MT 論文翻訳(概要): Multi-domain Multi-modal Document Classification Benchmark with a Multi-level Taxonomy

論文の概要: Multi-domain Multi-modal Document Classification Benchmark with a Multi-level Taxonomy

arxiv url: http://arxiv.org/abs/2605.10550v2
Date: Thu, 14 May 2026 12:54:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-15 21:45:34.38239
Title: Multi-domain Multi-modal Document Classification Benchmark with a Multi-level Taxonomy
Title（参考訳）: 多段階分類を用いたマルチドメインマルチモーダル文書分類ベンチマーク
Authors: Denghao Ma, Qing Liu, Zulong Chen, Chuanfei Xu, Jia Xu, Zhibo Yang, Wei Shao, Zhao Li,
Abstract要約: マルチレベル・マルチドメイン・マルチモーダル文書分類ベンチマーク(MMM-Bench)を構築した。 MMM-Benchは、(1)ビジネス文書の真正な組織論理を捉える5つのレベルにまたがる深い階層的な分類、(2)Alibabaの12の商業ドメインから慎重にキュレートされた実世界の5,990のマルチモーダル文書を含む。
参考スコア（独自算出の注目度）: 14.888842472004262
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Document classification forms the backbone of modern enterprise content management, yet existing benchmarks remain trapped in oversimplified paradigms -- single domain settings with flat label structures -- that bear little resemblance to the hierarchical, multi-modal, and cross-domain nature of real-world business documents. This gap not only misrepresents practical complexity but also stifles progress toward industrially viable document intelligence. To bridge this gap, we construct the first Multi-level, Multi-domain, Multi-modal document classification Benchmark (MMM-Bench). MMM-Bench includes (1) a deeply hierarchical taxonomy spanning five levels that capture the authentic organizational logic of business documentation; and (2) 5,990 real-world multi-modal documents meticulously curated from 12 commercial domains in Alibaba. Each document is manually annotated with a complete hierarchical path by domain experts. We establish comprehensive baselines on MMM-Bench, which consists of open-weight models and API-based models. Through systematic experiments, we identify four fundamental challenges within MMM-Bench and propose corresponding insights. To provide a solid foundation for advancing research in multi-level, multi-domain document classification, we release all of the data and the evaluation toolkit of MMM-Bench at https://github.com/MMMDC-Bench/MMMDC-Bench.
Abstract（参考訳）: ドキュメント分類は、現代のエンタープライズコンテンツ管理のバックボーンを形成するが、既存のベンチマークは、現実のビジネス文書の階層的、マルチモーダル、クロスドメイン的な性質にはほとんど似ていない、単純化されたパラダイム – フラットなラベル構造を持つ単一のドメイン設定 – に閉じ込められている。このギャップは、実用上の複雑さだけでなく、産業的に実行可能なドキュメントインテリジェンスへの進歩を阻害する。このギャップを埋めるため,最初のマルチレベル・マルチドメイン・マルチモーダル文書分類ベンチマーク(MMM-Bench)を構築した。 MMM-Benchは、(1)ビジネス文書の真正な組織論理を捉える5つのレベルにまたがる深い階層的な分類、(2)Alibabaの12の商業ドメインから慎重にキュレートされた実世界の5,990のマルチモーダル文書を含む。各文書は、ドメインの専門家によって完全な階層的なパスで手動で注釈付けされる。オープンウェイトモデルとAPIベースモデルからなるMMM-Benchの包括的ベースラインを確立する。系統的な実験を通じて,MMM-Bench内の4つの基本的な課題を特定し,それに対応する知見を提案する。マルチレベル・マルチドメイン文書分類の研究を進めるための確かな基盤を提供するため, https://github.com/MMMDC-Bench/MMMDC-BenchでMMM-Benchのすべてのデータと評価ツールキットをリリースする。

論文の概要: Multi-domain Multi-modal Document Classification Benchmark with a Multi-level Taxonomy

関連論文リスト