Agentic Proposing: Enhancing Large Language Model Reasoning via Compositional Skill Synthesis
- URL: http://arxiv.org/abs/2602.03279v1
- Date: Tue, 03 Feb 2026 09:02:53 GMT
- Title: Agentic Proposing: Enhancing Large Language Model Reasoning via Compositional Skill Synthesis
- Authors: Zhengbo Jiao, Shaobo Wang, Zifan Zhang, Xuan Ren, Wei Wang, Bing Zhao, Hu Wei, Linfeng Zhang,
- Abstract summary: Agentic Proposing is a framework that models problem synthesis as a goal-driven sequential decision process.<n>It generates high-precision, verifiable training trajectories across mathematics, coding, and science.<n>A 30B solver trained on only 11,000 synthesized trajectories achieves a state-of-the-art 91.6% accuracy on AIME25.
- Score: 10.951981109673119
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advancing complex reasoning in large language models relies on high-quality, verifiable datasets, yet human annotation remains cost-prohibitive and difficult to scale. Current synthesis paradigms often face a recurring trade-off: maintaining structural validity typically restricts problem complexity, while relaxing constraints to increase difficulty frequently leads to inconsistent or unsolvable instances. To address this, we propose Agentic Proposing, a framework that models problem synthesis as a goal-driven sequential decision process where a specialized agent dynamically selects and composes modular reasoning skills. Through an iterative workflow of internal reflection and tool-use, we develop the Agentic-Proposer-4B using Multi-Granularity Policy Optimization (MGPO) to generate high-precision, verifiable training trajectories across mathematics, coding, and science. Empirical results demonstrate that downstream solvers trained on agent-synthesized data significantly outperform leading baselines and exhibit robust cross-domain generalization. Notably, a 30B solver trained on only 11,000 synthesized trajectories achieves a state-of-the-art 91.6% accuracy on AIME25, rivaling frontier-scale proprietary models such as GPT-5 and proving that a small volume of high-quality synthetic signals can effectively substitute for massive human-curated datasets.
Related papers
- Agentic Adversarial QA for Improving Domain-Specific LLMs [53.00642389531106]
Large Language Models (LLMs) often struggle to adapt effectively to specialized domains.<n>We propose an adversarial question-generation framework that produces a compact set of semantically challenging questions.
arXiv Detail & Related papers (2026-02-20T10:53:09Z) - Beyond Quantity: Trajectory Diversity Scaling for Code Agents [51.71414642763219]
Trajectory Diversity Scaling is a data synthesis framework for code agents that scales performance through diversity rather than raw volume.<n> TDScaling integrates four innovations: (1) a Business Cluster mechanism that captures real-service logical dependencies; (2) a blueprint-driven multi-agent paradigm that enforces trajectory coherence; and (3) an adaptive evolution mechanism that steers toward long-tail scenarios.
arXiv Detail & Related papers (2026-02-03T07:43:03Z) - RAGShaper: Eliciting Sophisticated Agentic RAG Skills via Automated Data Synthesis [29.39426376890088]
Agentic Retrieval-Augmented Generation (RAG) empowers large language models to autonomously plan and retrieve information for complex problem-solving.<n>We introduce RAGShaper, a novel data synthesis framework designed to automate the construction of RAG tasks and robust agent trajectories.
arXiv Detail & Related papers (2026-01-13T16:25:07Z) - Dual-Phase LLM Reasoning: Self-Evolved Mathematical Frameworks [48.105258051884384]
This paper proposes a new two-stage training framework that enhances models' self-correction capabilities.<n>During the first stage, a multi-turn dialogue strategy guides the model to generate long chain-of-thought (CoT) data.<n>The second stage employs a difficulty-aware rejection sampling mechanism to dynamically optimize data distribution.
arXiv Detail & Related papers (2026-01-09T08:19:11Z) - Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis for Large Reasoning Models [54.29243291958429]
We develop a problem generator that reasons explicitly to plan problem directions before synthesis.<n>We treat the solver's feedback on synthetic problems as a reward signal, enabling the generator to calibrate difficulty.<n>Our method achieves an average improvement of 2.5% and generalizes to both language and vision-language models.
arXiv Detail & Related papers (2025-11-13T03:08:51Z) - EvoSyn: Generalizable Evolutionary Data Synthesis for Verifiable Learning [63.03672166010434]
We introduce an evolutionary, task-agnostic, strategy-guided, executably-checkable data synthesis framework.<n>It jointly synthesizes problems, diverse candidate solutions, and verification artifacts.<n>It iteratively discovers strategies via a consistency-based evaluator that enforces agreement between human-annotated and strategy-induced checks.
arXiv Detail & Related papers (2025-10-20T11:56:35Z) - Dynamic Generation of Multi-LLM Agents Communication Topologies with Graph Diffusion Models [99.85131798240808]
We introduce a novel generative framework called textitGuided Topology Diffusion (GTD)<n>Inspired by conditional discrete graph diffusion models, GTD formulates topology synthesis as an iterative construction process.<n>At each step, the generation is steered by a lightweight proxy model that predicts multi-objective rewards.<n>Experiments show that GTD can generate highly task-adaptive, sparse, and efficient communication topologies.
arXiv Detail & Related papers (2025-10-09T05:28:28Z) - Modèles de Substitution pour les Modèles à base d'Agents : Enjeux, Méthodes et Applications [0.0]
Agent-based models (ABM) are widely used to study emergent phenomena arising from local interactions.<n>The complexity of ABM limits their feasibility for real-time decision-making and large-scale scenario analysis.<n>To address these limitations, surrogate models offer an efficient alternative by learning approximations from sparse simulation data.
arXiv Detail & Related papers (2025-05-17T08:55:33Z) - MindGYM: What Matters in Question Synthesis for Thinking-Centric Fine-Tuning? [51.85759493254735]
MindGYM is a structured and scalable framework for question synthesis.<n>It infuses high-level reasoning objectives to shape the model's synthesis behavior.<n>It composes more complex multi-hop questions based on QA seeds for deeper reasoning.
arXiv Detail & Related papers (2025-03-12T16:03:03Z) - Explanation, Debate, Align: A Weak-to-Strong Framework for Language Model Generalization [0.6629765271909505]
This paper introduces a novel approach to model alignment through weak-to-strong generalization in the context of language models.
Our results suggest that this facilitation-based approach not only enhances model performance but also provides insights into the nature of model alignment.
arXiv Detail & Related papers (2024-09-11T15:16:25Z) - InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling [66.3072381478251]
Reward hacking, also termed reward overoptimization, remains a critical challenge.
We propose a framework for reward modeling, namely InfoRM, by introducing a variational information bottleneck objective.
We show that InfoRM's overoptimization detection mechanism is not only effective but also robust across a broad range of datasets.
arXiv Detail & Related papers (2024-02-14T17:49:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.