Fugu-MT 論文翻訳(概要): Lost in the Evidence? Reproducing Document Position and Context Size Effects in RAG

論文の概要: Lost in the Evidence? Reproducing Document Position and Context Size Effects in RAG

arxiv url: http://arxiv.org/abs/2605.27105v2
Date: Wed, 27 May 2026 08:04:02 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-28 17:38:55.162126
Title: Lost in the Evidence? Reproducing Document Position and Context Size Effects in RAG
Title（参考訳）: 証拠の喪失 : RAGにおける文書位置と文脈サイズ効果の再現
Authors: Jorge Gabín, Anxo Perez, Javier Parapar,
Abstract要約: Retrieval-Augmented Generation (RAG) システムは、取得した文書がモデルの入力コンテキストに入力されることに依存する。先行研究は、中年期の喪失や関連する長文現象のような位置に基づく効果を報告している。トピックサンプリングが分散の主要な原因であることを示し、小さなトピックセットはオーダリング効果を誇張することができる。また,より現実的なRAGシナリオについて検討した。
参考スコア（独自算出の注目度）: 3.597778914286147
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Retrieval-Augmented Generation (RAG) systems rely on retrieved documents being concatenated into a model's input context, making both document ordering and context size critical yet controversial design choices. Prior work reports position-based effects such as lost in the middle and related long-context phenomena. However, empirical findings remain inconsistent and hard to reproduce across models, datasets, and evaluation protocols. In this paper, we present a systematic reproducibility study that revisits these claims and examines how they evolve with contemporary LLMs under a controlled evaluation framework. We first show that topic sampling is a major source of variance: small topic sets can mask or exaggerate ordering effects. Based on repeated subset sampling across multiple topic budgets, we provide a practical calibration procedure that identifies topic counts yielding stable trends at feasible cost. Using these fixed topic sets, we then reproduce and extend results on position sensitivity, re-evaluating lost in the middle and positional biases in modern LLMs. Then, we also study a more realistic RAG scenario in which relevance is mediated by a retriever rather than oracle access to ground-truth documents. In this setting, we re-examine a recent industry study and identify discrepancies to evaluation choices such as limited topic coverage and reliance on LLM-based judges. Finally, we conduct an analysis of how retrieval order and context size affect downstream LLM performance under imperfect retrieval. Our results demonstrate that both factors interact strongly with retrieval quality and model choice, and that conclusions drawn from idealised setups do not always transfer to real-world RAG pipelines. We release all code and configurations to support reproducibility and future work on robust RAG evaluation.
Abstract（参考訳）: Retrieval-Augmented Generation (RAG) システムは、検索した文書をモデルの入力コンテキストにまとめることに依存し、文書の順序付けとコンテキストサイズの両方を批判的かつ議論の余地のある設計選択にする。先行研究は、中年期の喪失や関連する長文現象のような位置に基づく効果を報告している。しかし、実験的な発見は、モデル、データセット、評価プロトコル間では一貫性がなく、再現が難しいままである。本稿では,これらの主張を再検討し,制御された評価枠組みの下で現代LLMとどのように進化するかを考察する,系統的再現性研究について述べる。まず、トピックサンプリングが分散の主要な原因であることを示し、小さなトピックセットは順序付け効果をマスクまたは誇張することができる。複数のトピック予算にまたがる反復的なサブセットサンプリングに基づいて,トピック数を特定する実用的なキャリブレーション手法を提案する。これらの固定されたトピックセットを用いて位置感度に関する結果を再現・拡張し、現代LLMにおける中位偏差と位置偏差における損失を再評価する。また,より現実的なRAGシナリオについて検討し,その関連性はオーラルアクセスではなく,レトリバーによって媒介されることを示した。そこで,本研究では,最近の業界調査を再検討し,限定的なトピックカバレッジやLCMに基づく審査への依存など,選択肢評価の相違点を同定する。最後に,不完全な検索において,検索順序とコンテキストサイズが下流LLM性能に与える影響を解析する。以上の結果から,両要因が検索品質とモデル選択と強く相互作用し,理想化された設定から引き出された結論が実世界のRAGパイプラインに常に伝達されることが示唆された。再現性と堅牢なRAG評価に関する今後の作業をサポートするため、すべてのコードと構成をリリースします。

論文の概要: Lost in the Evidence? Reproducing Document Position and Context Size Effects in RAG

関連論文リスト