Fugu-MT 論文翻訳(概要): Semantic Entanglement in Vector-Based Retrieval: A Formal Framework and Context-Conditioned Disentanglement Pipeline for Agentic RAG Systems

論文の概要: Semantic Entanglement in Vector-Based Retrieval: A Formal Framework and Context-Conditioned Disentanglement Pipeline for Agentic RAG Systems

arxiv url: http://arxiv.org/abs/2604.17677v1
Date: Mon, 20 Apr 2026 00:24:34 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.633686
Title: Semantic Entanglement in Vector-Based Retrieval: A Formal Framework and Context-Conditioned Disentanglement Pipeline for Agentic RAG Systems
Title（参考訳）: ベクトルベース検索における意味的絡み合い:エージェントRAGシステムのための形式的枠組みとコンテキスト調和型絡み合いパイプライン
Authors: Nick Loghmani,
Abstract要約: 埋め込み空間における交叉重なりのモデル相対尺度として意味的絡み合いを定式化する。埋め込みに先立って文書を再構成する4段階の事前処理フレームワークであるセマンティック・ディスタングルメント・パイプライン(SDP)を紹介した。約25のサブドメインにわたる2,000以上のドキュメントからなる実世界の企業医療知識ベースでSDPを評価した。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Retrieval-Augmented Generation (RAG) systems depend on the geometric properties of vector representations to retrieve contextually appropriate evidence. When source documents interleave multiple topics within contiguous text, standard vectorization produces embedding spaces in which semantically distinct content occupies overlapping neighborhoods. We term this condition semantic entanglement. We formalize entanglement as a model-relative measure of cross-topic overlap in embedding space and define an Entanglement Index (EI) as a quantitative proxy. We argue that higher EI constrains attainable Top-K retrieval precision under cosine similarity retrieval. To address this, we introduce the Semantic Disentanglement Pipeline (SDP), a four-stage preprocessing framework that restructures documents prior to embedding. We further propose context-conditioned preprocessing, in which document structure is shaped by patterns of operational use, and a continuous feedback mechanism that adapts document structure based on agent performance. We evaluate SDP on a real-world enterprise healthcare knowledge base comprising over 2,000 documents across approximately 25 sub-domains. Top-K retrieval precision improves from approximately 32% under fixed-token chunking to approximately 82% under SDP, while mean EI decreases from 0.71 to 0.14. We do not claim that entanglement fully explains RAG failure, but that it captures a distinct preprocessing failure mode that downstream optimization cannot reliably correct once encoded into the vector space.
Abstract（参考訳）: Retrieval-Augmented Generation (RAG) システムは、文脈的に適切な証拠を取得するためにベクトル表現の幾何学的性質に依存する。ソース文書が連続したテキスト内で複数のトピックをインターリーブする場合、標準的なベクトル化は、意味的に異なる内容が重複する近傍を占有する埋め込み空間を生成する。この条件を意味的絡み合いと呼ぶ。エンタングルメントを埋め込み空間におけるクロストピックオーバーラップのモデル相対尺度として定式化し、エンタングルメント指数(EI)を定量的プロキシとして定義する。我々は,コサイン類似性検索において,高いEI制約がTop-K検索精度を達成できることを論じる。これを解決するために、埋め込み前にドキュメントを再構成する4段階の事前処理フレームワークであるSemantic Disentanglement Pipeline(SDP)を紹介します。さらに、文書構造を運用上の使用パターンによって形作るコンテキスト条件付き事前処理と、エージェントの性能に基づいて文書構造に適応する継続的フィードバック機構を提案する。約25のサブドメインにわたる2,000以上のドキュメントからなる実世界の企業医療知識ベースでSDPを評価した。トップK検索精度は約32%からSDPで約82%に向上し、平均EIは0.71から0.14に低下する。絡み合いがRAGの故障を完全に説明しているわけではないが、ベクトル空間にエンコードされたとき、下流の最適化が確実に修正できない、明確な前処理失敗モードを捉えている。

論文の概要: Semantic Entanglement in Vector-Based Retrieval: A Formal Framework and Context-Conditioned Disentanglement Pipeline for Agentic RAG Systems

関連論文リスト