Fugu-MT 論文翻訳(概要): Context Selection for Hypothesis and Statistical Evidence Extraction from Full-Text Scientific Articles

論文の概要: Context Selection for Hypothesis and Statistical Evidence Extraction from Full-Text Scientific Articles

arxiv url: http://arxiv.org/abs/2603.21193v1
Date: Sun, 22 Mar 2026 12:28:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 19:11:39.287023
Title: Context Selection for Hypothesis and Statistical Evidence Extraction from Full-Text Scientific Articles
Title（参考訳）: 全文科学論文から仮説と統計的証拠を抽出する文脈選択
Authors: Sai Koneru, Jian Wu, Sarah Rajtmajer,
Abstract要約: 論文の要約における一次発見の文が、論文本体の対応する仮説文に関連付けられている、逐次全文抽出設定について検討する。対象のコンテキスト選択は、全文プロンプトに対する仮説抽出を一貫して改善する。オラクルの段落でさえ、パフォーマンスは穏やかであり、ハイブリッドな数値-テクスチュアルステートメントを扱う際に、永続的な抽出子制限を示す。
参考スコア（独自算出の注目度）: 7.537972017257894
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Extracting hypotheses and their supporting statistical evidence from full-text scientific articles is central to the synthesis of empirical findings, but remains difficult due to document length and the distribution of scientific arguments across sections of the paper. The work studies a sequential full-text extraction setting, where the statement of a primary finding in an article's abstract is linked to (i) a corresponding hypothesis statement in the paper body and (ii) the statistical evidence that supports or refutes that hypothesis. This formulation induces a challenging within-document retrieval setting in which many candidate paragraphs are topically related to the finding but differ in rhetorical role, creating hard negatives for retrieval and extraction. Using a two-stage retrieve-and-extract framework, we conduct a controlled study of retrieval design choices, varying context quantity, context quality (standard Retrieval Augmented Generation, reranking, and a fine-tuned retriever paired with reranking), as well as an oracle paragraph setting to separate retrieval failures from extraction limits across four Large Language Model extractors. We find that targeted context selection consistently improves hypothesis extraction relative to full-text prompting, with gains concentrated in configurations that optimize retrieval quality and context cleanliness. In contrast, statistical evidence extraction remains substantially harder. Even with oracle paragraphs, performance remains moderate, indicating persistent extractor limitations in handling hybrid numeric-textual statements rather than retrieval failures alone.
Abstract（参考訳）: フルテキストの科学的論文から仮説を抽出し、その統計的証拠を裏付けることは、経験的発見の合成の中心であるが、文書の長さと論文のセクション間での科学的議論の分配が困難である。論文は、記事の要約における一次発見のステートメントがリンクされる、逐次的な全文抽出設定を研究する。一書面本体及び書面における対応する仮説文二その仮説を支持し、又は否定する統計的証拠この定式化は、多くの候補段落が発見とトポロジカルな関係にあるが、修辞的役割が異なる、難解な文書内検索環境を誘導し、検索と抽出の難易度を創出する。 2段階の検索・抽出フレームワークを用いて、検索設計の選択、文脈量の変化、文脈品質(標準的な検索・拡張生成、再ランク付け、微調整された検索)、および4つの大言語モデル抽出器間の抽出限界から検索障害を分離するためのオラクルの段落を設定する。対象のコンテキスト選択は、全文プロンプトに対する仮説抽出を一貫して改善し、検索品質とコンテキスト清浄度を最適化する構成に集中していることがわかった。対照的に、統計的証拠の抽出は依然としてかなり困難である。オラクルの段落であっても、性能は中途半端であり、検索障害のみではなく、ハイブリッドな数値文を扱う場合の抽出器の制限が持続していることを示している。

論文の概要: Context Selection for Hypothesis and Statistical Evidence Extraction from Full-Text Scientific Articles

関連論文リスト