Fugu-MT 論文翻訳(概要): Negation is Not Semantic: Diagnosing Dense Retrieval Failure Modes for Trade-offs in Contradiction-Aware Biomedical QA

論文の概要: Negation is Not Semantic: Diagnosing Dense Retrieval Failure Modes for Trade-offs in Contradiction-Aware Biomedical QA

arxiv url: http://arxiv.org/abs/2603.17580v1
Date: Wed, 18 Mar 2026 10:35:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-19 18:32:57.650893
Title: Negation is Not Semantic: Diagnosing Dense Retrieval Failure Modes for Trade-offs in Contradiction-Aware Biomedical QA
Title（参考訳）: 否定は意味的ではない: 対照的なバイオメディカルQAにおけるトレードオフのためのDense Retrieval failure Modeの診断
Authors: Soumya Ranjan Sahoo, Gagan N., Sanand Sasidharan, Divya Bharti,
Abstract要約: 大言語モデル (LLMs) は質問応答において強い能力を示してきたが、検証不可能なクレームを生成する傾向は、臨床環境において重大なリスクをもたらす。これらのリスクを軽減するため、TREC 2025 BioGenトラックは、矛盾する証拠を明示的に提示する根拠のついた回答を義務付けている。本稿では、SciFactデータセットを用いて、検索アーキテクチャを体系的に最適化するプロキシベースの開発フレームワークを提案する。
参考スコア（独自算出の注目度）: 1.0330395403064265
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Large Language Models (LLMs) have demonstrated strong capabilities in biomedical question answering, yet their tendency to generate plausible but unverified claims poses serious risks in clinical settings. To mitigate these risks, the TREC 2025 BioGen track mandates grounded answers that explicitly surface contradictory evidence (Task A) and the generation of narrative driven, fully attributed responses (Task B). Addressing the absence of target ground truth, we present a proxy-based development framework using the SciFact dataset to systematically optimize retrieval architectures. Our iterative evaluation revealed a "Simplicity Paradox": complex adversarial dense retrieval strategies failed catastrophically at contradiction detection (MRR 0.023) due to Semantic Collapse, where negation signals become indistinguishable in vector space. We further identify a Retrieval Asymmetry: filtering dense embeddings improves contradiction detection but degrades support recall, compromising reliability. We resolve this via a Decoupled Lexical Architecture built on a unified BM25 backbone, balancing semantic support recall (0.810) with precise contradiction surfacing (0.750). This approach achieves the highest Weighted MRR (0.790) on the proxy benchmark while remaining the only viable strategy for scaling to the 30 million document PubMed corpus. For answer generation, we introduce Narrative Aware Reranking and One-Shot In-Context Learning, improving citation coverage from 50% (zero-shot) to 100%. Official TREC results confirm our findings: our system ranks 2nd on Task A contradiction F1 and 3rd out of 50 runs on Task B citation coverage (98.77%), achieving zero citation contradict rate. Our work transforms LLMs from stochastic generators into honest evidence synthesizers, showing that epistemic integrity in biomedical AI requires precision and architectural scalability isolated metric optimization.
Abstract（参考訳）: 大規模言語モデル (LLMs) は, バイオメディカルな質問応答において強い能力を示したが, 検証不可能な主張を生み出す傾向は臨床的に重大なリスクをもたらす。これらのリスクを軽減するため、TREC 2025 BioGenトラックは、矛盾する証拠(Task A)と物語駆動の完全な応答(Task B)を明示的に表す答えを定めている。そこで本研究では,SciFactデータセットを用いて,検索アーキテクチャを体系的に最適化するプロキシベースの開発フレームワークを提案する。我々の反復的評価では「単純パラドックス(Simplicity Paradox)」が示され、複雑な対向性高密度検索戦略はセマンティック崩壊による矛盾検出(MRR 0.023)で破滅的に失敗し、ベクトル空間では否定信号が識別不能となる。密埋め込みをフィルタリングすることで矛盾検出が向上するが、リコールのサポートが低下し、信頼性が向上する。我々は、BM25のバックボーンを統一したデカップリングレキシカルアーキテクチャを用いて解決し、セマンティックサポートリコール(0.810)と正確な矛盾を克服する(0.750)。このアプローチは、プロキシベンチマークで最も高い重み付きMRR (0.790) を達成すると同時に、3000万のドキュメントPubMedコーパスへのスケーリングのための唯一の実行可能な戦略を保っている。回答生成にはNarrative Aware Re rankとOne-Shot In-Context Learningを導入し、引用カバレッジを50%(ゼロショット)から100%改善する。我々のシステムはタスクAの矛盾F1で2位、50のうち3位はタスクBの引用カバレッジ98.77%で、ゼロの引用の矛盾率を実現している。我々の研究は、LSMを確率的ジェネレータから正直なエビデンスシンセサイザーに変換し、バイオメディカルAIにおけるエピステマティックな整合性には精度とアーキテクチャのスケーラビリティが孤立したメートル法最適化を必要とすることを示した。

論文の概要: Negation is Not Semantic: Diagnosing Dense Retrieval Failure Modes for Trade-offs in Contradiction-Aware Biomedical QA

関連論文リスト