Fugu-MT 論文翻訳(概要): Learning to Retrieve from Agent Trajectories

論文の概要: Learning to Retrieve from Agent Trajectories

arxiv url: http://arxiv.org/abs/2604.04949v1
Date: Mon, 30 Mar 2026 17:59:02 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-08 17:42:09.353953
Title: Learning to Retrieve from Agent Trajectories
Title（参考訳）: エージェント軌道からの学習
Authors: Yuqi Zhou, Sunhao Dai, Changle Qu, Liang Pang, Jun Xu, Ji-Rong Wen,
Abstract要約: 我々はエージェント間相互作用データから直接エージェント探索のための検索モデルを訓練すべきであると主張している。エージェント・トラジェクトリからの学習を新たな訓練パラダイムとして導入し,マルチステップエージェントのインタラクションから指導を導出する。本研究は,エージェント探索時代における検索の方向性を示すとともに,エージェントトラジェクトリを実用的でスケーラブルな監視源として強調するものである。
参考スコア（独自算出の注目度）: 72.8923565916533
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Information retrieval (IR) systems have traditionally been designed and trained for human users, with learning-to-rank methods relying heavily on large-scale human interaction logs such as clicks and dwell time. With the rapid emergence of large language model (LLM) powered search agents, however, retrieval is increasingly consumed by agents rather than human beings, and is embedded as a core component within multi-turn reasoning and action loops. In this setting, retrieval models trained under human-centric assumptions exhibit a fundamental mismatch with the way agents issue queries and consume results. In this work, we argue that retrieval models for agentic search should be trained directly from agent interaction data. We introduce learning to retrieve from agent trajectories as a new training paradigm, where supervision is derived from multi-step agent interactions. Through a systematic analysis of search agent trajectories, we identify key behavioral signals that reveal document utility, including browsing actions, unbrowsed rejections, and post-browse reasoning traces. Guided by these insights, we propose LRAT, a simple yet effective framework that mines high-quality retrieval supervision from agent trajectories and incorporates relevance intensity through weighted optimization. Extensive experiments on both in-domain and out-of-domain deep research benchmarks demonstrate that retrievers trained with LRAT consistently improve evidence recall, end-to-end task success, and execution efficiency across diverse agent architectures and scales. Our results highlight agent trajectories as a practical and scalable supervision source, pointing to a promising direction for retrieval in the era of agentic search.
Abstract（参考訳）: 情報検索(IR)システムは、伝統的に人間のユーザのために設計され、訓練されてきた。しかし,大規模言語モデル (LLM) を用いた検索エージェントが急速に出現するにつれて,検索は人間ではなくエージェントによって消費され,マルチターン推論やアクションループのコアコンポーネントとして組み込まれている。この設定では、人間中心の仮定の下で訓練された検索モデルは、エージェントがクエリを発行し、結果を消費する方法と根本的なミスマッチを示す。本研究では,エージェント間相互作用データから直接,エージェント探索のための検索モデルを訓練すべきである,と論じる。エージェント・トラジェクトリからの学習を新たな訓練パラダイムとして導入し,マルチステップエージェントのインタラクションから指導を導出する。検索エージェントトラジェクトリの系統的解析により,閲覧行動,閲覧拒否,ブラウザ後推論トレースなど,文書の有用性を明らかにする重要な行動信号が同定される。これらの知見に導かれたLRATは,エージェントトラジェクトリから高品質な検索監視をマイニングし,重み付け最適化による関連強度を取り入れた,シンプルで効果的なフレームワークである。ドメイン内およびドメイン外両方のディープリサーチベンチマークに関する大規模な実験は、LRATでトレーニングされたレトリバーが、さまざまなエージェントアーキテクチャとスケールにわたるエビデンスリコール、エンドツーエンドタスクの成功、実行効率を一貫して改善していることを示している。本研究は,エージェント探索時代における検索の方向性を示すとともに,エージェントトラジェクトリを実用的でスケーラブルな監視源として強調するものである。

論文の概要: Learning to Retrieve from Agent Trajectories

関連論文リスト