Fugu-MT 論文翻訳(概要): Context-Adaptive Synthesis and Compression for Enhanced Retrieval-Augmented Generation in Complex Domains

論文の概要: Context-Adaptive Synthesis and Compression for Enhanced Retrieval-Augmented Generation in Complex Domains

arxiv url: http://arxiv.org/abs/2508.19357v1
Date: Tue, 26 Aug 2025 18:34:20 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-28 19:07:41.395612
Title: Context-Adaptive Synthesis and Compression for Enhanced Retrieval-Augmented Generation in Complex Domains
Title（参考訳）: 複雑な領域における検索拡張生成のための文脈適応型合成と圧縮
Authors: Peiran Zhou, Junnan Zhu, Yichen Shen, Ruoxi Yu,
Abstract要約: 大規模言語モデルは言語タスクでは優れているが、幻覚や時代遅れの知識を持つ傾向がある。検索拡張生成(Retrieval-Augmented Generation)は、LLMを外部知識に接地することでこれらを緩和する。検索したコンテキストをインテリジェントに処理する新しいフレームワークであるCASCを提案する。
参考スコア（独自算出の注目度）: 4.6053294562865625
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) excel in language tasks but are prone to hallucinations and outdated knowledge. Retrieval-Augmented Generation (RAG) mitigates these by grounding LLMs in external knowledge. However, in complex domains involving multiple, lengthy, or conflicting documents, traditional RAG suffers from information overload and inefficient synthesis, leading to inaccurate and untrustworthy answers. To address this, we propose CASC (Context-Adaptive Synthesis and Compression), a novel framework that intelligently processes retrieved contexts. CASC introduces a Context Analyzer & Synthesizer (CAS) module, powered by a fine-tuned smaller LLM, which performs key information extraction, cross-document consistency checking and conflict resolution, and question-oriented structured synthesis. This process transforms raw, scattered information into a highly condensed, structured, and semantically rich context, significantly reducing the token count and cognitive load for the final Reader LLM. We evaluate CASC on SciDocs-QA, a new challenging multi-document question answering dataset designed for complex scientific domains with inherent redundancies and conflicts. Our extensive experiments demonstrate that CASC consistently outperforms strong baselines.
Abstract（参考訳）: 言語モデル(LLM)は言語タスクでは優れるが、幻覚や時代遅れの知識が伴う傾向がある。 Retrieval-Augmented Generation (RAG) は、LLMを外部知識に接地することでこれらを緩和する。しかし、複数の、長い、あるいは矛盾する文書を含む複雑なドメインでは、従来のRAGは情報過負荷と非効率な合成に悩まされ、不正確で信頼できない答えをもたらす。そこで我々は,検索したコンテキストをインテリジェントに処理する新しいフレームワークであるCASC(Context-Adaptive Synthesis and Compression)を提案する。 CASC は Context Analyzer & Synthesizer (CAS) モジュールを導入し,キー情報抽出,文書間の整合性確認,コンフリクト解決,質問対象の構造化合成を行う。このプロセスは、生の散らばった情報を高度に凝縮され、構造化され、意味的にリッチなコンテキストに変換し、最終読取 LLM のトークン数と認知負荷を著しく低減する。 SciDocs-QAは,複雑な科学的領域に固有の冗長性や矛盾を伴って設計された,新しい挑戦的な多文書質問応答データセットである。広範な実験により,CASCは強いベースラインを一貫して上回っていることが示された。

論文の概要: Context-Adaptive Synthesis and Compression for Enhanced Retrieval-Augmented Generation in Complex Domains

関連論文リスト