Fugu-MT 論文翻訳(概要): LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG

論文の概要: LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG

arxiv url: http://arxiv.org/abs/2605.06285v1
Date: Thu, 07 May 2026 13:56:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-08 22:27:11.866843
Title: LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG
Title（参考訳）: LatentRAG : 効率的なエージェントRAGの遅延推論と検索
Authors: Yijia Zheng, Marcel Worring,
Abstract要約: LatentRAGは、推論と検索の両方を独立した言語空間から連続的な潜在空間にシフトする新しいフレームワークである。 LatentRAGは、推論遅延を約90%削減しながら、明示的なエージェントRAGメソッドに匹敵するパフォーマンスを実現している。
参考スコア（独自算出の注目度）: 13.420568360763227
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Single-step retrieval-augmented generation (RAG) provides an efficient way to incorporate external information for simple question answering tasks but struggles with complex questions. Agentic RAG extends this paradigm by replacing single-step retrieval with a multi-step process, in which the large language model (LLM) acts as a search agent that generates intermediate thoughts and subqueries to iteratively interact with the retrieval system. This iterative process incurs substantial latency due to the autoregressive generation of lengthy thoughts and subqueries. To address this limitation, we propose LatentRAG, a novel framework that shifts both reasoning and retrieval from discrete language space to continuous latent space. Unlike existing explicit methods that generate natural language thoughts or subqueries token-by-token, LatentRAG produces latent tokens for thoughts and subqueries directly from the hidden states in a single forward pass. We align LLMs with dense retrieval models in the latent space, enabling retrieval over latent subquery tokens and supporting end-to-end joint optimization. To improve transparency and encourage semantically meaningful latent representations, we incorporate a parallel latent decoding mechanism that translates latent tokens back into natural language. Extensive experiments on seven benchmark datasets show that LatentRAG achieves performance comparable to explicit agentic RAG methods while reducing inference latency by approximately 90%, substantially narrowing the latency gap with traditional single-step RAG.
Abstract（参考訳）: 単一ステップ検索拡張生成(RAG)は、単純な質問応答タスクに外部情報を組み込むのに、複雑な質問に苦労する効率的な方法を提供する。エージェントRAGはこのパラダイムを拡張し、単一ステップの検索を多段階のプロセスに置き換え、大きな言語モデル(LLM)が検索システムと反復的に対話する中間的な思考とサブクエリを生成するサーチエージェントとして機能する。この反復的なプロセスは、長い思考とサブクエリの自己回帰生成によってかなりの遅延を引き起こす。この制限に対処するために、離散言語空間から連続潜在空間へ推論と検索の両方を移行させる新しいフレームワークであるLatentRAGを提案する。自然言語の思考やサブクエリのトークン・バイ・トークンを生成する既存の明示的なメソッドとは異なり、LatntRAGは単一の前方パスで隠された状態から直接、思考やサブクエリの潜在トークンを生成する。我々はLLMを潜時空間の高密度検索モデルと整列し、潜時サブクエリートークンの検索を可能にし、エンドツーエンドのジョイント最適化をサポートする。透過性を向上し,意味論的に意味のある潜在表現を促進するために,潜在トークンを自然言語に翻訳する並列潜在復号機構を組み込んだ。 7つのベンチマークデータセットに対する大規模な実験により、LatentRAGは明示的なエージェントRAGメソッドに匹敵するパフォーマンスを達成し、推論遅延を約90%削減し、従来の単一ステップRAGとのレイテンシギャップを大幅に縮小した。

論文の概要: LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG

関連論文リスト