Related papers: Dr. Zero: Self-Evolving Search Agents without Training Data

Dr. Zero: Self-Evolving Search Agents without Training Data

URL: http://arxiv.org/abs/2601.07055v1
Date: Sun, 11 Jan 2026 20:27:55 GMT
Title: Dr. Zero: Self-Evolving Search Agents without Training Data
Authors: Zhenrui Yue, Kartikeya Upasani, Xianjun Yang, Suyu Ge, Shaoliang Nie, Yuning Mao, Zhe Liu, Dong Wang,
Abstract summary: We introduce Dr. Zero, a framework enabling search agents to effectively self-evolve without any training data.<n>In particular, we design a self-evolution feedback loop where a proposer generates diverse questions to train a solver from the same base model.<n>To enhance training efficiency, we also introduce hop-grouped relative policy optimization (HRPO)
Score: 34.91191770652202
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As high-quality data becomes increasingly difficult to obtain, data-free self-evolution has emerged as a promising paradigm. This approach allows large language models (LLMs) to autonomously generate and solve complex problems, thereby improving their reasoning capabilities. However, multi-turn search agents struggle in data-free self-evolution due to the limited question diversity and the substantial compute required for multi-step reasoning and tool using. In this work, we introduce Dr. Zero, a framework enabling search agents to effectively self-evolve without any training data. In particular, we design a self-evolution feedback loop where a proposer generates diverse questions to train a solver initialized from the same base model. As the solver evolves, it incentivizes the proposer to produce increasingly difficult yet solvable tasks, thus establishing an automated curriculum to refine both agents. To enhance training efficiency, we also introduce hop-grouped relative policy optimization (HRPO). This method clusters structurally similar questions to construct group-level baselines, effectively minimizing the sampling overhead in evaluating each query's individual difficulty and solvability. Consequently, HRPO significantly reduces the compute requirements for solver training without compromising performance or stability. Extensive experiment results demonstrate that the data-free Dr. Zero matches or surpasses fully supervised search agents, proving that complex reasoning and search capabilities can emerge solely through self-evolution.

Related papers

Bidirectional Curriculum Generation: A Multi-Agent Framework for Data-Efficient Mathematical Reasoning [16.95900718416944]
We introduce a novel Bidirectional Curriculum Generation framework to maximize the instructional value of every training sample.<n>Unlike rigid trajectories, our multi-agent ecosystem mimics adaptive pedagogy to establish a closed feedback loop.<n>This mechanism ensures that the model consumes only the most effective data at any given stage.
arXiv Detail & Related papers (2026-03-05T12:49:21Z)
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning [84.70211451226835]
Large Language Model (LLM) Agents are constrained by a dependency on human-curated data.<n>We introduce Agent0, a fully autonomous framework that evolves high-performing agents without external data.<n>Agent0 substantially boosts reasoning capabilities, improving the Qwen3-8B-Base model by 18% on mathematical reasoning and 24% on general reasoning benchmarks.
arXiv Detail & Related papers (2025-11-20T05:01:57Z)
Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis for Large Reasoning Models [54.29243291958429]
We develop a problem generator that reasons explicitly to plan problem directions before synthesis.<n>We treat the solver's feedback on synthetic problems as a reward signal, enabling the generator to calibrate difficulty.<n>Our method achieves an average improvement of 2.5% and generalizes to both language and vision-language models.
arXiv Detail & Related papers (2025-11-13T03:08:51Z)
Empowering RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search [70.63903518295785]
We introduce RepoSearch-R1, a novel agentic reinforcement learning framework driven by Monte-carlo Tree Search.<n>Based on RepoSearch-R1, we construct a RepoQA-Agent specifically designed for repository question-answering tasks.
arXiv Detail & Related papers (2025-10-30T09:10:36Z)
BMGQ: A Bottom-up Method for Generating Complex Multi-hop Reasoning Questions from Semi-structured Data [8.52473384574856]
We present an automated framework for generating high-difficulty, training-ready multi-hop questions from semi-structured knowledge sources.<n>The system grows diverse, logically labeled evidence clusters through Natural Language Inference (NLI)-based relation typing and diversity-aware expansion.
arXiv Detail & Related papers (2025-10-28T07:43:15Z)
Synthesizing Agentic Data for Web Agents with Progressive Difficulty Enhancement Mechanisms [81.90219895125178]
Web-based 'deep research' agents aim to solve complex question - answering tasks through long-horizon interactions with online tools.<n>These tasks remain challenging, as the underlying language models are often not optimized for long-horizon reasoning.<n>We introduce a two-pronged data synthesis pipeline that generates question - answer pairs by progressively increasing complexity.
arXiv Detail & Related papers (2025-10-15T06:34:46Z)
EvolMathEval: Towards Evolvable Benchmarks for Mathematical Reasoning via Evolutionary Testing [45.89558878854675]
EvolMathEval is an automated mathematical benchmark generation and evolution framework based on evolutionary testing.<n>It can generate a large volume of high-difficulty problems through continuous self-iteration.<n>It can also significantly enhance the complexity of public datasets like GSM8K through evolution, reducing model accuracy by an average of 48%.
arXiv Detail & Related papers (2025-08-18T15:24:10Z)
Large Language Model-Aided Evolutionary Search for Constrained Multiobjective Optimization [15.476478159958416]
We employ a large language model (LLM) to enhance evolutionary search for solving constrained multi-objective optimization problems. Our aim is to speed up the convergence of the evolutionary population.
arXiv Detail & Related papers (2024-05-09T13:44:04Z)
Factorization of Multi-Agent Sampling-Based Motion Planning [72.42734061131569]
Modern robotics often involves multiple embodied agents operating within a shared environment. Standard sampling-based algorithms can be used to search for solutions in the robots' joint space. We integrate the concept of factorization into sampling-based algorithms, which requires only minimal modifications to existing methods. We present a general implementation of a factorized SBA, derive an analytical gain in terms of sample complexity for PRM*, and showcase empirical results for RRG.
arXiv Detail & Related papers (2023-04-01T15:50:18Z)
Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments. We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data. Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.