Fugu-MT 論文翻訳(概要): Efficient Rationale-based Retrieval: On-policy Distillation from Generative Rerankers based on JEPA

論文の概要: Efficient Rationale-based Retrieval: On-policy Distillation from Generative Rerankers based on JEPA

arxiv url: http://arxiv.org/abs/2604.23336v1
Date: Sat, 25 Apr 2026 14:45:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:07.278192
Title: Efficient Rationale-based Retrieval: On-policy Distillation from Generative Rerankers based on JEPA
Title（参考訳）: 効率的なRationale-based Retrieval:JEPAに基づくジェネレーティブ・リランカーからのオン・ポリティクス蒸留
Authors: Teng Chen, Sheng Xu, Feixiang Guo, Xiaoyu Wang, Qingqing Gu, Hongyan Li, Luo Ji,
Abstract要約: 論理に基づく検索は通常、クエリとドキュメントのペアのクロスエンコーディングを必要とする。 Rabtrieverはクエリとドキュメントをエンコードし、リランカに同等のクロスクエリドキュメント理解機能を提供する。 Rabtriever は MS MARCO や BEIR といった従来のベンチマークでよく一般化されている。
参考スコア（独自算出の注目度）: 8.95939511590498
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Unlike traditional fact-based retrieval, rationale-based retrieval typically necessitates cross-encoding of query-document pairs using large language models, incurring substantial computational costs. To address this limitation, we propose Rabtriever, which independently encodes queries and documents, while providing comparable cross query-document comprehension capabilities to rerankers. We start from training a LLM-based generative reranker, which puts the document prior to the query and prompts the LLM to generate the relevance score by log probabilities. We then employ it as the teacher of an on-policy distillation framework, with Rabtriever as the student to reconstruct the teacher's contextual-aware query embedding. To achieve this effect, Rabtriever is first initialized from the teacher, with parameters frozen. The Joint-Embedding Predictive Architecture (JEPA) paradigm is then adopted, which integrates a lightweight, trainable predictor between LLM layers and heads, projecting the query embedding into a new hidden space, with the document embedding as the latent vector. JEPA then minimizes the distribution difference between this projected embedding and the teacher embedding. To strengthen the sampling efficiency of on-policy distillation, we also add an auxiliary loss on the reverse KL of LLM logits, to reshape the student's logit distribution. Rabtriever optimizes the teacher's quadratic complexity on the document length to linear, verified both theoretically and empirically. Experiments show that Rabtriever outperforms different retriever baselines across diverse rationale-based tasks, including empathetic conversations and robotic manipulations, with minor accuracy degradation from the reranker. Rabtriever also generalizes well on traditional retrieval benchmarks such as MS MARCO and BEIR, with comparable performance to the best retriever baseline.
Abstract（参考訳）: 従来のファクトベースの検索とは異なり、論理ベースの検索は通常、大きな言語モデルを使用してクエリ-ドキュメントペアのクロスエンコーディングを必要とし、かなりの計算コストを発生させる。この制限に対処するため,クエリとドキュメントを独立してエンコードするRabtrieverを提案する。まず LLM ベースの生成リランカをトレーニングし,クエリの前に文書を配置し,ログの確率による関連点の生成を促す。そして、それをオンライン蒸留フレームワークの教師として採用し、Rabtrieverを学生として、教師のコンテキスト対応クエリの埋め込みを再構築する。この効果を達成するため、Rabtrieverはまず教師から初期化され、パラメータは凍結される。次に、JEPA(Joint-Embedding Predictive Architecture)パラダイムが採用され、LLM層とヘッドの間に軽量でトレーニング可能な予測器を統合し、クエリの埋め込みを新しい隠れスペースに、ドキュメントの埋め込みを潜在ベクトルとして投影する。次にJEPAは、このプロジェクションされた埋め込みと教師の埋め込みの間の分散の違いを最小限にする。また, オンライン蒸留のサンプリング効率を高めるため, LLMロジットの逆KLに補助損失を加え, 生徒のロジット分布を再構築する。ラブトリバーは、教師の文書長の二次的な複雑さを線形に最適化し、理論的にも経験的にも検証する。実験の結果、Rabtrieverは共感的な会話やロボット操作など、さまざまな合理的なタスクにおいて、レトリバーベースラインよりも優れており、リランカーの精度はわずかであることがわかった。 Rabtriever は MS MARCO や BEIR などの従来の検索ベンチマークでもよく一般化されており、最高の検索基準に匹敵する性能である。

論文の概要: Efficient Rationale-based Retrieval: On-policy Distillation from Generative Rerankers based on JEPA

関連論文リスト