Fugu-MT 論文翻訳(概要): Navigating Large-Scale Document Collections: MuDABench for Multi-Document Analytical QA

論文の概要: Navigating Large-Scale Document Collections: MuDABench for Multi-Document Analytical QA

arxiv url: http://arxiv.org/abs/2604.22239v1
Date: Fri, 24 Apr 2026 05:28:51 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-27 15:36:26.354417
Title: Navigating Large-Scale Document Collections: MuDABench for Multi-Document Analytical QA
Title（参考訳）: 大規模ドキュメントコレクションのナビゲーション: マルチドキュメント分析QAのための MuDABench
Authors: Zhanli Li, Yixuan Cao, Lvzhou Luo, Ping Luo,
Abstract要約: 本稿では,大規模半構造化文書コレクションに対する解析的質問応答の課題について紹介する。マルチドキュメント分析QAのベンチマークである MuDABench を提案する。
参考スコア（独自算出の注目度）: 25.155696504567718
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This paper introduces the task of analytical question answering over large, semi-structured document collections. We present MuDABench, a benchmark for multi-document analytical QA, where questions require extracting and synthesizing information across numerous documents to perform quantitative analysis. Unlike existing multi-document QA benchmarks that typically require information from only a few documents with limited cross-document reasoning, MuDABench demands extensive inter-document analysis and aggregation. Constructed via distant supervision by leveraging document-level metadata and annotated financial databases, MuDABench comprises over 80,000 pages and 332 analytical QA instances. We also propose an evaluation protocol that measures final answer accuracy and uses intermediate-fact coverage as an auxiliary diagnostic signal for the reasoning process. Experiments reveal that standard RAG systems, which treat all documents as a flat retrieval pool, perform poorly. To address these limitations, we propose a multi-agent workflow that orchestrates planning, extraction, and code generation modules. While this approach substantially improves both process and outcome metrics, a significant gap remains compared to human expert performance. Our analysis identifies two primary bottlenecks: single-document information extraction accuracy and insufficient domain-specific knowledge in current systems. MuDABench is available at https://github.com/Zhanli-Li/MuDABench.
Abstract（参考訳）: 本稿では,大規模半構造化文書コレクションに対する解析的質問応答の課題について紹介する。本稿では,多文書分析QAのベンチマークである MuDABench について述べる。クロスドキュメント推論に制限のある少数のドキュメントからの情報を必要とする既存のマルチドキュメントQAベンチマークとは異なり、MuDABenchはドキュメント間分析と集約を広範囲に要求する。 MuDABenchは、ドキュメントレベルのメタデータと注釈付き金融データベースを活用して、遠隔監視によって構築され、80,000ページ以上と332の分析QAインスタンスから構成される。また、最終回答の精度を計測し、中間要素のカバレッジを推論プロセスの補助的な診断信号として利用する評価プロトコルを提案する。実験の結果,すべての文書をフラットな検索プールとして扱う標準的なRAGシステムの性能は低いことがわかった。これらの制約に対処するために、計画、抽出、コード生成モジュールを編成するマルチエージェントワークフローを提案する。このアプローチはプロセスと結果のメトリクスを大幅に改善しますが、人間の専門家のパフォーマンスと比べて大きな差があります。本分析では, 単一文書情報抽出精度と, 現在のシステムにおけるドメイン固有知識の不足という2つの主要なボトルネックを明らかにした。 MuDABenchはhttps://github.com/Zhanli-Li/MuDABench.comで入手できる。

論文の概要: Navigating Large-Scale Document Collections: MuDABench for Multi-Document Analytical QA

関連論文リスト