Fugu-MT 論文翻訳(概要): SRAG: RAG with Structured Data Improves Vector Retrieval

論文の概要: SRAG: RAG with Structured Data Improves Vector Retrieval

arxiv url: http://arxiv.org/abs/2603.26670v1
Date: Tue, 27 Jan 2026 07:27:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-06 02:36:13.053537
Title: SRAG: RAG with Structured Data Improves Vector Retrieval
Title（参考訳）: SRAG: 構造化データ付きRAGはベクトル検索を改善する
Authors: Shalin Shah, Srikanth Ryali, Ramasubbu Venkatesh,
Abstract要約: Retrieval Augmented Generation (RAG)は、LLMに必要な情報基盤を提供する。 RAGはまた、LLMに事実情報を提供する手段として知識グラフトリプルを使用することもできる。本稿では,クエリに構造化情報とトピック,感情,クエリ,チャンク型のチャンクを付加する構造RAG(Structured RAG)を提案する。
参考スコア（独自算出の注目度）: 1.1288006309687828
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Retrieval Augmented Generation (RAG) provides the necessary informational grounding to LLMs in the form of chunks retrieved from a vector database or through web search. RAG could also use knowledge graph triples as a means of providing factual information to an LLM. However, the retrieval is only based on representational similarity between a question and the contents. The performance of RAG depends on the numeric vector representations of the query and the chunks. To improve these representations, we propose Structured RAG (SRAG), which adds structured information to a query as well as the chunks in the form of topics, sentiments, query and chunk types (e.g., informational, quantitative), knowledge graph triples and semantic tags. Experiments indicate that this method significantly improves the retrieval process. Using GPT-5 as an LLM-as-a-judge, results show that the method improves the score given to answers in a question answering system by 30% (p-value = 2e-13) (with tighter bounds). The strongest improvement is in comparative, analytical and predictive questions. The results suggest that our method enables broader, more diverse, and episodic-style retrieval. Tail risk analysis shows that SRAG attains very large gains more often, with losses remaining minor in magnitude.
Abstract（参考訳）: Retrieval Augmented Generation (RAG) は、ベクトルデータベースから取得したチャンクやWeb検索を通じてLLMに必要な情報基盤を提供する。 RAGはまた、LLMに事実情報を提供する手段として知識グラフトリプルを使用することもできる。しかし、検索は、質問と内容の間の表現的類似性にのみ依存する。 RAGのパフォーマンスは、クエリとチャンクの数値ベクトル表現に依存する。これらの表現を改善するために、構造化RAG(Structured RAG)を提案する。これは、クエリに構造化情報に加えて、トピック、感情、クエリ、チャンクタイプ(例えば、情報、量)、知識グラフのトリプル、セマンティックタグの形式でチャンクを追加する。実験により,本手法は検索過程を大幅に改善することが示された。その結果, GPT-5 を LLM-as-a-judge として用いることで, 質問応答システムにおける回答のスコアを 30% (p-value = 2e-13) 向上させることができた。最も大きな改善は、比較的、分析的、予測的な質問である。その結果,提案手法はより広範,多様で,エピソード的な検索を可能にすることが示唆された。タイルリスク分析により、SRAGはより多く増加し、損失は微少に留まった。

論文の概要: SRAG: RAG with Structured Data Improves Vector Retrieval

関連論文リスト