Related papers: Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

URL: http://arxiv.org/abs/2602.08222v1
Date: Mon, 09 Feb 2026 02:50:40 GMT
Title: Weak-Driven Learning: How Weak Agents make Strong Agents Stronger
Authors: Zehao Chen, Gongxun Li, Tianxiang Ai, Yifei Li, Zixuan Huang, Wang Zhou, Fuzhen Zhuang, Xianglong Liu, Jianxin Li, Deqing Wang, Yikun Ban,
Abstract summary: We propose WMSS (Weak Agents Can Make Strong Agents Stronger), a post-training paradigm that leverages weak checkpoints to guide continued optimization.<n> Experiments on mathematical reasoning and code generation datasets show that agents trained with our approach achieve effective performance improvements.
Score: 46.50703640719333
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As post-training optimization becomes central to improving large language models, we observe a persistent saturation bottleneck: once models grow highly confident, further training yields diminishing returns. While existing methods continue to reinforce target predictions, we find that informative supervision signals remain latent in models' own historical weak states. Motivated by this observation, we propose WMSS (Weak Agents Can Make Strong Agents Stronger), a post-training paradigm that leverages weak checkpoints to guide continued optimization. By identifying recoverable learning gaps via entropy dynamics and reinforcing them through compensatory learning, WMSS enables strong agents to improve beyond conventional post-training saturation. Experiments on mathematical reasoning and code generation datasets show that agents trained with our approach achieve effective performance improvements, while incurring zero additional inference cost.

Related papers

Reinforcement Learning with Backtracking Feedback [12.680874918250069]
We introduce Reinforcement Learning with Backtracking Feedback (RLBF)<n>This framework advances upon prior methods, such as BSAFE.<n>We show that RLBF significantly reduces attack success rates across diverse benchmarks and model scales.
arXiv Detail & Related papers (2026-02-09T08:23:19Z)
From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models [77.04403907729738]
This survey charts the evolution of uncertainty from a passive diagnostic metric to an active control signal guiding real-time model behavior.<n>We demonstrate how uncertainty is leveraged as an active control signal across three frontiers.<n>This survey argues that mastering the new trend of uncertainty is essential for building the next generation of scalable, reliable, and trustworthy AI.
arXiv Detail & Related papers (2026-01-22T06:21:31Z)
Co-Evolving Agents: Learning from Failures as Hard Negatives [38.61683607205988]
Recent work has explored self-improving agents that autonomously generate, refine, and re-train on their own trajectories.<n>We propose a co-evolving agents framework in which a target agent improves jointly with an auxiliary failure agent.
arXiv Detail & Related papers (2025-11-27T09:30:33Z)
Explore Data Left Behind in Reinforcement Learning for Reasoning Language Models [61.78513830395669]
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as an effective approach for improving the reasoning abilities of large language models (LLMs)<n>As models train longer and scale larger, more training prompts become residual prompts, those with zero variance rewards that provide no training signal.<n>We propose the Explore Residual Prompts in Policy Optimization framework, which encourages exploration on residual prompts and reactivates their training signals.
arXiv Detail & Related papers (2025-11-06T20:40:27Z)
Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning [25.53799024782883]
Large language model (LLM) unlearning aims to surgically remove the influence of undesired data or knowledge from an existing model.<n>Recent findings reveal that unlearning manipulations such as weight quantization or fine-tuning can quickly neutralize the intended forgetting.
arXiv Detail & Related papers (2025-10-01T10:50:14Z)
On the Diminishing Returns of Complex Robust RAG Training in the Era of Powerful LLMs [85.688901949146]
We investigate the question: does the benefit of complex robust training methods diminish as language models become more powerful?<n>Our analysis reveals a consistent trend: emphthe marginal robustness benefit of sophisticated training strategies decreases substantially as model capacity increases.<n>Further investigation demonstrates that stronger models naturally exhibit better confidence calibration, cross-dataset generalization capability, and more effective attention patterns, even under simple training regimes.
arXiv Detail & Related papers (2025-02-17T03:34:31Z)
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Augmenting Unsupervised Reinforcement Learning with Self-Reference [63.68018737038331]
Humans possess the ability to draw on past experiences explicitly when learning new tasks. We propose the Self-Reference (SR) approach, an add-on module explicitly designed to leverage historical information. Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark.
arXiv Detail & Related papers (2023-11-16T09:07:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.