Fugu-MT 論文翻訳(概要): MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

論文の概要: MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

arxiv url: http://arxiv.org/abs/2603.28130v1
Date: Mon, 30 Mar 2026 07:47:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:45.287189
Title: MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios
Title（参考訳）: MDPBench: 実世界のシナリオにおける多言語文書解析のためのベンチマーク
Authors: Zhang Li, Zhibo Lin, Qiang Liu, Ziyang Zhang, Shuo Zhang, Zidun Guo, Jiajun Song, Jiarui Zhang, Xiang Bai, Yuliang Liu,
Abstract要約: 我々は,多言語デジタルおよび写真文書解析のための最初のベンチマークであるMultilingual Document Parsing Benchmarkを紹介する。 MDPBenchは17言語にまたがる3,400のドキュメントイメージ、多様なスクリプト、さまざまな写真条件で構成されている。
参考スコア（独自算出の注目度）: 72.8160644291677
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce Multilingual Document Parsing Benchmark, the first benchmark for multilingual digital and photographed document parsing. Document parsing has made remarkable strides, yet almost exclusively on clean, digital, well-formatted pages in a handful of dominant languages. No systematic benchmark exists to evaluate how models perform on digital and photographed documents across diverse scripts and low-resource languages. MDPBench comprises 3,400 document images spanning 17 languages, diverse scripts, and varied photographic conditions, with high-quality annotations produced through a rigorous pipeline of expert model labeling, manual correction, and human verification. To ensure fair comparison and prevent data leakage, we maintain separate public and private evaluation splits. Our comprehensive evaluation of both open-source and closed-source models uncovers a striking finding: while closed-source models (notably Gemini3-Pro) prove relatively robust, open-source alternatives suffer dramatic performance collapse, particularly on non-Latin scripts and real-world photographed documents, with an average drop of 17.8% on photographed documents and 14.0% on non-Latin scripts. These results reveal significant performance imbalances across languages and conditions, and point to concrete directions for building more inclusive, deployment-ready parsing systems. Source available at https://github.com/Yuliang-Liu/MultimodalOCR.
Abstract（参考訳）: 我々は,多言語デジタルおよび写真文書解析のための最初のベンチマークであるMultilingual Document Parsing Benchmarkを紹介する。文書解析は目覚ましい進歩を遂げているが、ほとんどは、少数の支配的な言語における、クリーンでデジタルで、十分に整形されたページに焦点を当てている。様々なスクリプトと低リソース言語にわたるデジタルおよび写真ドキュメント上でモデルがどのように機能するかを評価するための体系的なベンチマークは存在しない。 MDPBenchは、17の言語にまたがる3,400のドキュメントイメージ、多様なスクリプト、さまざまな写真条件で構成されており、厳密な専門家モデルのラベル付け、手動の修正、人間による検証などを通じて高品質なアノテーションが作成されている。公正な比較を確保し、データの漏洩を防止するため、我々は個別のパブリックとプライベートの評価分割を維持する。クローズドソースモデル(特にGemini3-Pro)は比較的堅牢であるが、特に非ラテン語のスクリプトや実世界の写真ドキュメントでは、パフォーマンスが劇的に低下し、写真化されたドキュメントでは平均17.8%が、非ラテン語のスクリプトでは14.0%が減少している。これらの結果から、言語と条件間での大幅なパフォーマンスの不均衡が明らかとなり、より包括的でデプロイメント対応の構文解析システムを構築するための具体的な方向性が示唆された。ソースコードはhttps://github.com/Yuliang-Liu/MultimodalOCRで入手できる。

論文の概要: MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

関連論文リスト