An Open and Reproducible Deep Research Agent for Long-Form Question Answering
- URL: http://arxiv.org/abs/2512.13059v1
- Date: Mon, 15 Dec 2025 07:37:53 GMT
- Title: An Open and Reproducible Deep Research Agent for Long-Form Question Answering
- Authors: Ikuya Yamada, Wataru Ikeda, Ko Yoshida, Mengyu Ye, Hinata Sugimoto, Masatoshi Suzuki, Hisanori Ozaki, Jun Suzuki,
- Abstract summary: We present an open deep research system for long-form question answering, selected as a winning system in the text-to-text track of the MMU-RAG competition at NeurIPS 2025.<n>The system combines an open-source large language model (LLM) with an open web search API to perform iterative retrieval, reasoning, and synthesis in real-world open-domain settings.
- Score: 8.315124732850943
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present an open deep research system for long-form question answering, selected as a winning system in the text-to-text track of the MMU-RAG competition at NeurIPS 2025. The system combines an open-source large language model (LLM) with an open web search API to perform iterative retrieval, reasoning, and synthesis in real-world open-domain settings. To enhance reasoning quality, we apply preference tuning based on LLM-as-a-judge feedback that evaluates multiple aspects, including clarity, insightfulness, and factuality. Our experimental results show that the proposed method consistently improves answer quality across all three aspects. Our source code is publicly available at https://github.com/efficient-deep-research/efficient-deep-research.
Related papers
- Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models [79.77807330964576]
Vision-DeepResearch systems use search engines for complex visual-textual fact-finding.<n>Existing benchmarks are not visual search-centric.<n>We construct the Vision-DeepResearch benchmark (VDR-Bench) comprising 2,000 VQA instances.
arXiv Detail & Related papers (2026-02-02T14:53:11Z) - Understanding DeepResearch via Reports [41.60038455664918]
DeepResearch is a transformative AI paradigm, conducting expert-level research through sophisticated reasoning and multi-tool integration.<n> evaluating these systems remains critically challenging due to open-ended research scenarios and existing benchmarks that focus on isolated capabilities.<n>We introduce DeepResearch-ReportEval, a comprehensive framework designed to assess DeepResearch systems through their most representative outputs: research reports.
arXiv Detail & Related papers (2025-10-09T07:03:43Z) - Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs [7.3517692707289415]
We introduce Fathom-DeepResearch, an agentic system composed of two specialized models.<n>The first is Fathom-Search-4B, a DeepSearch model optimized for evidence-based investigation through live web search and targeted webpage querying.<n>The second is Fathom- Synthesizer-4B, trained from Qwen3-4B, which converts multi-turn DeepSearch traces into structured, citation-dense DeepResearch Reports.
arXiv Detail & Related papers (2025-09-28T22:58:11Z) - DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis [52.636738269442766]
We introduce DeepScholar-bench, a live benchmark and holistic, automated evaluation framework designed to evaluate generative research synthesis systems.<n>DeepScholar-bench draws queries from recent, high-quality ArXiv papers and focuses on a real research synthesis task.<n>We also develop DeepScholar-base, a reference pipeline implemented efficiently using the LOTUS API.
arXiv Detail & Related papers (2025-08-27T16:36:34Z) - BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent [74.10138164281618]
BrowseComp-Plus is a benchmark derived from BrowseComp, employing a fixed, carefully curated corpus.<n>This benchmark allows comprehensive evaluation and disentangled analysis of deep research agents and retrieval methods.
arXiv Detail & Related papers (2025-08-08T17:55:11Z) - DeepResearch$^{\text{Eco}}$: A Recursive Agentic Workflow for Complex Scientific Question Answering in Ecology [0.0]
DeepResearch is a novel agentic LLM-based system for automated scientific synthesis.<n>It supports depth- and breadth-controlled exploration of original research questions.<n>DeepResearch achieves up to a 21-fold increase in source integration.
arXiv Detail & Related papers (2025-07-14T17:47:28Z) - From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents [96.65646344634524]
Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research.<n>We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn.<n>We demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking.
arXiv Detail & Related papers (2025-06-23T17:27:19Z) - DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research [25.368303145176554]
DeepResearchGym is an open-source sandbox that combines a search API with a rigorous evaluation protocol for benchmarking deep research systems.<n>The API indexes large-scale public web corpora, namely ClueWeb22 and FineWeb, using a state-of-the-art dense retriever and approximate nearest neighbor search via DiskANN.<n>It achieves lower latency than popular commercial APIs while ensuring stable document rankings across runs, and is freely available for research use.
arXiv Detail & Related papers (2025-05-25T18:16:13Z) - ManuSearch: Democratizing Deep Search in Large Language Models with a Transparent and Open Multi-Agent Framework [73.91207117772291]
ManuSearch is a transparent and modular multi-agent framework designed to democratize deep search for large language models (LLMs)<n>ManuSearch decomposes the search and reasoning process into three collaborative agents: (1) a solution planning agent that iteratively formulates sub-queries, (2) an Internet search agent that retrieves relevant documents via real-time web search, and (3) a structured webpage reading agent that extracts key evidence from raw web content.
arXiv Detail & Related papers (2025-05-23T17:02:02Z) - SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis [94.33978856270268]
Retrieval-augmented generation (RAG) systems have advanced large language models (LLMs) in complex deep search scenarios.<n>Existing approaches face critical limitations that lack high-quality training trajectories and suffer from distributional mismatches.<n>This paper introduces SimpleDeepSearcher, a framework that bridges the gap through strategic data engineering rather than complex training paradigms.
arXiv Detail & Related papers (2025-05-22T16:05:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.