Fugu-MT 論文翻訳(概要): AuthTrace: Diagnosing Evidence Construction in Thematically Dense Single-Author Corpora

論文の概要: AuthTrace: Diagnosing Evidence Construction in Thematically Dense Single-Author Corpora

arxiv url: http://arxiv.org/abs/2605.25382v2
Date: Tue, 26 May 2026 10:32:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-27 17:51:41.09688
Title: AuthTrace: Diagnosing Evidence Construction in Thematically Dense Single-Author Corpora
Title（参考訳）: AuthTrace: セマンティックなシングルオーセンタコーパスにおけるエビデンス構築の診断
Authors: Xiaoqing Wu, Feifei Li, Haoliang Ming, Wenhui Que,
Abstract要約: AuthTraceは,主題的に密集した単一著者コーパスに基づいて構築された診断ベンチマークである。 AuthTraceは明示的な引用されたエビデンス、正確なファンインアノテーション、エビデンスリコール、エビデンス精度、答えの正当性を測定する統一パックレベルのプロトコルを提供する。
参考スコア（独自算出の注目度）: 6.956097396264084
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Evidence construction--the stage that determines which passages reach the language model before generation begins--is evaluated paradigm by paradigm, leaving practitioners with no principled way to diagnose which organization strategy fails, where, or why. We introduce AuthTrace, a diagnostic benchmark built on thematically dense single-author corpora where near-miss distractors share style, topic, and vocabulary with the required evidence. AuthTrace provides explicit quoted evidence, exact fan-in annotation, and a unified pack-level protocol measuring evidence recall, evidence precision, and answer correctness. A fan-in gradient--the number of source documents required to support the answer--serves as the primary diagnostic axis, enabling controlled comparison across retrieval, memory, graph, and structured-evidence paradigms. Evaluating eight systems across two QA models, we find that evidence recall is the strongest observed predictor of answer correctness under the primary reader-judge pair (r = 0.96); most failures stem from missing evidence rather than answer synthesis. Fan-in further exposes paradigm-specific collapse patterns: flat retrieval degrades 2-3x faster than thematically organized evidence construction. These results show fan-in decomposition to be a reusable diagnostic lens for identifying where evidence-construction systems fail and which paradigm best serves a given workload.
Abstract（参考訳）: エビデンス・コンストラクション(エビデンス・コンストラクション) - 生成が始まる前にどのパスが言語モデルに達するかを決定するステージは、パラダイムによって評価される。 AuthTraceは、密集したシングルオーサコーパスに基づいて構築された診断ベンチマークで、必要となるエビデンスと、そのスタイル、トピック、語彙を共有する。 AuthTraceは明示的な引用されたエビデンス、正確なファンインアノテーション、エビデンスリコール、エビデンス精度、答えの正当性を測定する統一パックレベルのプロトコルを提供する。ファンイン勾配(ファンイン勾配) - 主診断軸として回答をサポートするのに必要な情報源文書の数 - 検索、メモリ、グラフ、構造化エビデンスパラダイム間の制御された比較を可能にする。 2つのQAモデルにまたがる8つのシステムを評価すると、エビデンスリコールはプライマリ・ジャッジ・ペア (r = 0.96) の解答正解率の最も強い予測因子であることがわかった。フラット検索は、数学的に組織されたエビデンス構築よりも2～3倍早く劣化する。これらの結果から,ファンイン分解はエビデンス・コンストラクションシステムが故障した場所と,どのパラダイムが与えられた作業負荷に最適かを特定するための再利用可能な診断レンズであることが示された。

論文の概要: AuthTrace: Diagnosing Evidence Construction in Thematically Dense Single-Author Corpora

関連論文リスト