Fugu-MT 論文翻訳(概要): Parametric Retrieval-Augmented Generation using Latent Routing of LoRA Adapters

論文の概要: Parametric Retrieval-Augmented Generation using Latent Routing of LoRA Adapters

arxiv url: http://arxiv.org/abs/2511.17044v1
Date: Fri, 21 Nov 2025 08:44:21 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-24 18:08:18.941521
Title: Parametric Retrieval-Augmented Generation using Latent Routing of LoRA Adapters
Title（参考訳）: LoRAアダプタの潜時ルーティングを用いたパラメトリック検索拡張生成
Authors: Zhan Su, Fengran Mo, Jian-yun Nie,
Abstract要約: Parametric Retrieval-Augmented Generation (PRAG)は、外部知識を直接Large Language Model (LLM)に統合する。現在のPRAGアプローチでは、個々のドキュメントに専用のLoRAアダプタを使用して、textbfone-to-oneドキュメントエンコーディング方式を採用している。本稿では,潜在経路符号化プロセスを利用するPRAGにおける経路の符号化のための新しいパラダイムを提案する。
参考スコア（独自算出の注目度）: 27.694134466842502
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Parametric Retrieval-Augmented Generation (PRAG) is a novel RAG paradigm that integrates external knowledge directly into a Large Language Model (LLM) by parameterizing documents using LoRA adapters, demonstrating reduced inference costs compared to traditional RAG approaches. However, current PRAG approaches adopt a \textbf{one-to-one} document encoding scheme, using a dedicated LoRA adapter for each individual document. This scheme introduces two major limitations: First, it leads to data scarcity, as the training datasets for individual LoRA adapters are limited. Second, it incurs high overhead during inference, requiring the merging of LLM weights with a new LoRA adapter for every candidate passage, which is computationally inefficient. To overcome these challenges, we propose a novel paradigm for encoding passages in PRAG that utilizes a latent routing encoding process (Poly-PRAG). During offline encoding, we treat the encoding of a set of documents as a multi-task learning process, where each passage is assigned a unique task identifier. By employing a routing function, we use a small set of latent LoRA adapters to encode the entire passage space. During online inference, this routing function selectively activates a subset of latent experts based on the input query. We conduct comprehensive evaluations of Poly-PRAG across multiple knowledge-intensive NLP tasks. Our extensive experiments demonstrate the effectiveness of the proposed method, achieving state-of-the-art results on four distinct datasets.
Abstract（参考訳）: Parametric Retrieval-Augmented Generation (PRAG)は、LLM(Large Language Model)に直接外部知識を統合する新しいRAGパラダイムである。しかし、現在のPRAGアプローチでは、個々の文書に対して専用のLoRAアダプタを使用して、textbf{one-to-one}ドキュメントエンコーディング方式を採用している。まず、個々のLoRAアダプタのトレーニングデータセットが制限されているため、データの不足につながる。第二に、推論中に高いオーバーヘッドを発生させ、LLM重みを新しいLoRAアダプタにマージする必要があるが、これは計算的に非効率である。これらの課題を克服するために、潜時ルーティング符号化プロセス(Poly-PRAG)を利用したPRAGのパスを符号化する新しいパラダイムを提案する。オフラインのエンコーディングでは、文書の集合のエンコーディングをマルチタスク学習プロセスとして扱い、各パスにはユニークなタスク識別子が割り当てられる。ルーティング関数を用いることで、パス空間全体をエンコードするために、潜在LoRAアダプタの小さなセットを使用する。オンライン推論では、このルーティング関数は入力クエリに基づいて潜在専門家のサブセットを選択的に活性化する。複数の知識集約型NLPタスクに対して,Poly-PRAGの総合評価を行う。提案手法の有効性を実験的に検証し、4つの異なるデータセットに対して最先端の結果を得る。

論文の概要: Parametric Retrieval-Augmented Generation using Latent Routing of LoRA Adapters

関連論文リスト