Fugu-MT 論文翻訳(概要): SPIRE: Structure-Preserving Interpretable Retrieval of Evidence

論文の概要: SPIRE: Structure-Preserving Interpretable Retrieval of Evidence

arxiv url: http://arxiv.org/abs/2604.20849v1
Date: Thu, 12 Feb 2026 03:46:10 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-04 02:32:14.104725
Title: SPIRE: Structure-Preserving Interpretable Retrieval of Evidence
Title（参考訳）: SPIRE: 構造保存の解釈可能な証拠検索
Authors: Mike Rainey, Umut Acar, Muhammed Sezer,
Abstract要約: 木構造ドキュメント上で動作する構造対応検索パイプラインを提案する。私たちは、ドキュメントプリミティブの小さなセット、パスとパスセットを定義します。グローバルな文脈化は、選択を理解不能にするために必要な非局所的な足場を追加する。局所的な文脈化は、その構造領域内での種選択を拡張して、コンパクトで文脈に富んだビューを得る。
参考スコア（独自算出の注目度）: 0.09558392439655013
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Retrieval-augmented generation over semi-structured sources such as HTML is constrained by a mismatch between document structure and the flat, sequence-based interfaces of today's embedding and generative models. Retrieval pipelines often linearize documents into fixed-size chunks before indexing, which obscures section structure, lists, and tables, and makes it difficult to return small, citation-ready evidence without losing the surrounding context that makes it interpretable. We present a structure-aware retrieval pipeline that operates over tree-structured documents. The core idea is to represent candidates as subdocuments: precise, addressable selections that preserve structural identity while deferring the choice of surrounding context. We define a small set of document primitives--paths and path sets, subdocument extraction by pruning, and two contextualization mechanisms. Global contextualization adds the non-local scaffolding needed to make a selection intelligible (e.g., titles, headers, list and table structure). Local contextualization expands a seed selection within its structural neighborhood to obtain a compact, context-rich view under a target budget. Building on these primitives, we describe an embedding-based candidate generator that indexes sentence-seeded subdocuments and a query-time, document-aware aggregation step that amortizes shared structural context. We then introduce a contextual filtering stage that re-scores retrieved candidates using locally contextualized views. Across experiments on HTML question-answering benchmarks, we find that preserving structure while contextualizing selections yields higher-quality, more diverse citations under fixed budgets than strong passage-based baselines, while maintaining scalability.
Abstract（参考訳）: HTMLのような半構造化ソース上の検索拡張生成は、ドキュメント構造と今日の埋め込みおよび生成モデルにおけるフラットでシーケンスベースのインターフェースとのミスマッチによって制約される。検索パイプラインは、しばしばインデックス化の前に文書を固定サイズのチャンクにリニア化するが、これはセクション構造、リスト、テーブルを曖昧にし、解釈可能な周囲のコンテキストを失うことなく、小さな引用可能な証拠を返すのを難しくする。木構造ドキュメント上で動作する構造対応検索パイプラインを提案する。中心となる考え方は、候補をサブドキュメントとして表現することである: 周囲のコンテキストの選択を遅らせながら構造的アイデンティティを保持する、正確でアドレス可能な選択である。文書プリミティブの小さなセット-パスとパスセット、プルーニングによるサブドキュメント抽出、2つのコンテキスト化機構を定義します。グローバルなコンテキスト化は、選択(タイトル、ヘッダ、リスト、テーブル構造など)を理解不能にするために必要な、非ローカルな足場を追加します。局所的な文脈化は、ターゲット予算の下でコンパクトでコンテキストに富んだビューを得るために、その構造的地区内で種選択を拡張する。これらのプリミティブに基づいて、文の種別サブドキュメントをインデクシングする埋め込みベースの候補ジェネレータと、共有構造コンテキストを記憶するクエリ時間対応のドキュメントアグリゲーションステップを記述する。次に、局所的な文脈化ビューを用いて、検索した候補を再スコアするコンテキストフィルタリングステージを導入する。提案手法は,HTML質問応答ベンチマーク実験において,コンテクスト化時に構造を保存することで,高いパスベースベースラインよりも高い品質,多彩な引用が得られ,スケーラビリティが維持される。

論文の概要: SPIRE: Structure-Preserving Interpretable Retrieval of Evidence

関連論文リスト