AgentIR: Reasoning-Aware Retrieval for Deep Research Agents
- URL: http://arxiv.org/abs/2603.04384v2
- Date: Thu, 05 Mar 2026 18:56:37 GMT
- Title: AgentIR: Reasoning-Aware Retrieval for Deep Research Agents
- Authors: Zijian Chen, Xueguang Ma, Shengyao Zhuang, Jimmy Lin, Akari Asai, Victor Zhong,
- Abstract summary: Deep Research agents generate explicit natural language reasoning before each search call.<n> Reasoning-Aware Retrieval embeds the agent's reasoning trace alongside its query.<n>DR- Synth generates Deep Research retriever training data from standard QA datasets.<n>AgentIR-4B achieves 68% accuracy with the open-weight agent Tongyi-DeepResearch.
- Score: 76.29382561831105
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Research agents are rapidly emerging as primary consumers of modern retrieval systems. Unlike human users who issue and refine queries without documenting their intermediate thought processes, Deep Research agents generate explicit natural language reasoning before each search call, revealing rich intent and contextual information that existing retrievers entirely ignore. To exploit this overlooked signal, we introduce: (1) Reasoning-Aware Retrieval, a retrieval paradigm that jointly embeds the agent's reasoning trace alongside its query; and (2) DR-Synth, a data synthesis method that generates Deep Research retriever training data from standard QA datasets. We demonstrate that both components are independently effective, and their combination yields a trained embedding model, AgentIR-4B, with substantial gains. On the challenging BrowseComp-Plus benchmark, AgentIR-4B achieves 68\% accuracy with the open-weight agent Tongyi-DeepResearch, compared to 50\% with conventional embedding models twice its size, and 37\% with BM25. Code and data are available at: https://texttron.github.io/AgentIR/.
Related papers
- SAGE: Steerable Agentic Data Generation for Deep Search with Execution Feedback [68.60326181052658]
We propose an agentic pipeline that automatically generates high quality, difficulty-controlled deep search question-answer pairs.<n>Our pipeline, SAGE, consists of a data generator which proposes QA pairs and a search agent which attempts to solve the generated question.<n>Our intrinsic evaluation shows SAGE generates questions that require diverse reasoning strategies, while significantly increases the correctness and difficulty of the generated data.
arXiv Detail & Related papers (2026-01-26T06:37:56Z) - AgentExpt: Automating AI Experiment Design with LLM-based Resource Retrieval Agent [36.65355075707938]
One key application in AI research is to automate experiment design through agentic and baseline retrieval.<n>We present a comprehensive framework for baseline and dataset recommendation.<n>We develop a reasoning-augmented reranker that exact interaction chains to construct explicit reasoning chains and finetunes a large language model to produce interpretable justifications.
arXiv Detail & Related papers (2025-11-07T01:51:56Z) - Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise Analytics [75.4712507893024]
Enterprise Deep Research (EDR) is a multi-agent system that integrates a Master Planning Agent for adaptive query decomposition.<n>Four specialized search agents (General, Academic, GitHub, LinkedIn) and a visualization agent for data-driven insights are also included.<n>EDR reflects research direction with optional human-in-the-loop steering guidance.
arXiv Detail & Related papers (2025-10-20T17:55:11Z) - InfoAgent: Advancing Autonomous Information-Seeking Agents [143.15973604285304]
We introduce InfoAgent, a deep research agent powered by an innovative data synthesis pipeline and orchestrated web search tools.<n>With our methods, InfoAgent achieves 15.3% accuracy on BrowseComp, 29.2% on BrowseComp-ZH, and 40.4% on Xbench-DS.
arXiv Detail & Related papers (2025-09-29T17:59:57Z) - Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs [7.3517692707289415]
We introduce Fathom-DeepResearch, an agentic system composed of two specialized models.<n>The first is Fathom-Search-4B, a DeepSearch model optimized for evidence-based investigation through live web search and targeted webpage querying.<n>The second is Fathom- Synthesizer-4B, trained from Qwen3-4B, which converts multi-turn DeepSearch traces into structured, citation-dense DeepResearch Reports.
arXiv Detail & Related papers (2025-09-28T22:58:11Z) - DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery [26.388978716803464]
Can AI agents transcend conventional search to systematically discover any dataset that meets specific user requirements?<n>Our benchmark and comprehensive analysis provide the foundation for the next generation of self-improving AI systems.
arXiv Detail & Related papers (2025-08-09T12:15:08Z) - BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent [74.10138164281618]
BrowseComp-Plus is a benchmark derived from BrowseComp, employing a fixed, carefully curated corpus.<n>This benchmark allows comprehensive evaluation and disentangled analysis of deep research agents and retrieval methods.
arXiv Detail & Related papers (2025-08-08T17:55:11Z) - Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.