Related papers: AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis

AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis

URL: http://arxiv.org/abs/2510.24695v1
Date: Tue, 28 Oct 2025 17:50:47 GMT
Title: AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis
Authors: Xuanzhong Chen, Zile Qiao, Guoxin Chen, Liangcai Su, Zhen Zhang, Xinyu Wang, Pengjun Xie, Fei Huang, Jingren Zhou, Yong Jiang,
Abstract summary: Training large language model agents on tasks at the frontier of their capabilities is key to unlocking advanced reasoning.<n>We introduce a data synthesis approach inspired by the educational theory of the Zone of Proximal Development (ZPD)<n>We present the AgentFrontier Engine, an automated pipeline that synthesizes high-quality, multidisciplinary data situated precisely within the ZPD.
Score: 69.06292316741126
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training large language model agents on tasks at the frontier of their capabilities is key to unlocking advanced reasoning. We introduce a data synthesis approach inspired by the educational theory of the Zone of Proximal Development (ZPD), which defines this frontier as tasks an LLM cannot solve alone but can master with guidance. To operationalize this, we present the AgentFrontier Engine, an automated pipeline that synthesizes high-quality, multidisciplinary data situated precisely within the LLM's ZPD. This engine supports both continued pre-training with knowledge-intensive data and targeted post-training on complex reasoning tasks. From the same framework, we derive the ZPD Exam, a dynamic and automated benchmark designed to evaluate agent capabilities on these frontier tasks. We train AgentFrontier-30B-A3B model on our synthesized data, which achieves state-of-the-art results on demanding benchmarks like Humanity's Last Exam, even surpassing some leading proprietary agents. Our work demonstrates that a ZPD-guided approach to data synthesis offers a scalable and effective path toward building more capable LLM agents.

Related papers

O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL [28.10102994309489]
We introduce a novel framework for the automated synthesis of sophisticated, research-grade instructional data.<n>Our approach centers on a multi-agent workflow where collaborative AI agents simulate complex tool-integrated reasoning.<n>We develop a two-stage training strategy that integrates supervised fine-tuning with a novel reinforcement learning method.
arXiv Detail & Related papers (2026-01-07T09:31:10Z)
APTBench: Benchmarking Agentic Potential of Base LLMs During Pre-Training [48.20667772172573]
APTBench is a framework that converts real-world agent tasks and successful trajectories into multiple-choice or text completion questions.<n>It focuses on core agentic abilities, e.g., planning and action, and covers key agent scenarios, software engineering and deep research.<n>Compared to existing general-purpose benchmarks, APTBench offers a more predictive signal of a model's downstream performance as an agent.
arXiv Detail & Related papers (2025-10-28T13:11:22Z)
Tagging the Thought: Unlocking Personalization Reasoning via Reinforcement Learning [28.86134365096663]
We introduce TagPR, a novel training framework that enhances an Large Language Models' intrinsic capacity for personalization reasoning.<n>Our approach achieves state-of-the-art results, delivering an average improvement of 32.65% over the base model across all tasks.
arXiv Detail & Related papers (2025-09-27T06:03:44Z)
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning [73.91893534088798]
WebSailor is a complete post-training methodology designed to instill this crucial capability.<n>Our approach involves generating novel, high-uncertainty tasks through structured sampling and information obfuscation.<n>WebSailor significantly outperforms all open-source agents in complex information-seeking tasks.
arXiv Detail & Related papers (2025-09-16T17:57:03Z)
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents [93.26456498576181]
This paper focuses on the development of native Autonomous Single-Agent models for Deep Research.<n>Our best variant SFR-DR-20B achieves up to 28.7% on Humanity's Last Exam benchmark.
arXiv Detail & Related papers (2025-09-08T02:07:09Z)
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey [104.31926740841128]
The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL)<n>This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decision Processes (MDPs) of LLM-RL with the temporally extended, partially observable Markov decision processes (POMDPs) that define Agentic RL.
arXiv Detail & Related papers (2025-09-02T17:46:26Z)
R&D-Agent: An LLM-Agent Framework Towards Autonomous Data Science [70.1638335489284]
High-level machine learning engineering tasks remain labor-intensive and iterative.<n>We introduce R&D-Agent, a comprehensive, decoupled, and framework that formalizes the machine learning process.<n>R&D-Agent defines the MLE into two phases and six components, turning agent design for MLE into a principled, testable process.
arXiv Detail & Related papers (2025-05-20T06:07:00Z)
PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators [19.026312915461553]
We propose a model-based offline reinforcement learning (RL) approach called PerSim. We first learn a personalized simulator for each agent by collectively using the historical trajectories across all agents prior to learning a policy. This representation suggests a simple, regularized neural network architecture to effectively learn the transition dynamics per agent, even with scarce, offline data.
arXiv Detail & Related papers (2021-02-13T17:16:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.