Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis for Large Reasoning Models
- URL: http://arxiv.org/abs/2511.09907v1
- Date: Fri, 14 Nov 2025 01:17:46 GMT
- Title: Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis for Large Reasoning Models
- Authors: Yongxian Wei, Yilin Zhao, Li Shen, Xinrui Chen, Runxi Cheng, Sinan Du, Hao Yu, Gang Liu, Jiahong Yan, Chun Yuan, Dian Li,
- Abstract summary: We develop a problem generator that reasons explicitly to plan problem directions before synthesis.<n>We treat the solver's feedback on synthetic problems as a reward signal, enabling the generator to calibrate difficulty.<n>Our method achieves an average improvement of 2.5% and generalizes to both language and vision-language models.
- Score: 54.29243291958429
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Data synthesis for training large reasoning models offers a scalable alternative to limited, human-curated datasets, enabling the creation of high-quality data. However, existing approaches face several challenges: (i) indiscriminate generation that ignores the solver's ability and yields low-value problems, or reliance on complex data pipelines to balance problem difficulty; and (ii) a lack of reasoning in problem generation, leading to shallow problem variants. In this paper, we develop a problem generator that reasons explicitly to plan problem directions before synthesis and adapts difficulty to the solver's ability. Specifically, we construct related problem pairs and augment them with intermediate problem-design CoT produced by a reasoning model. These data bootstrap problem-design strategies from the generator. Then, we treat the solver's feedback on synthetic problems as a reward signal, enabling the generator to calibrate difficulty and produce complementary problems near the edge of the solver's competence. Extensive experiments on 10 mathematical and general reasoning benchmarks show that our method achieves an average improvement of 2.5% and generalizes to both language and vision-language models. Moreover, a solver trained on the synthesized data provides improved rewards for continued generator training, enabling co-evolution and yielding a further 0.7% performance gain. Our code will be made publicly available here.
Related papers
- QueST: Incentivizing LLMs to Generate Difficult Problems [77.75835742350644]
Large Language Models have achieved strong performance on reasoning tasks, solving competition-level coding and math problems.<n>Existing competitive coding datasets contain only thousands to tens of thousands of problems.<n>We propose QueST, a novel framework which combines difficulty-aware graph sampling and difficulty-aware rejection fine-tuning.
arXiv Detail & Related papers (2025-10-20T16:29:53Z) - ScaleDiff: Scaling Difficult Problems for Advanced Mathematical Reasoning [51.946959481392064]
Large Reasoning Models (LRMs) have shown impressive capabilities in complex problem-solving.<n>We propose ScaleDiff, a pipeline designed to scale the creation of difficult problems.<n>We show that our pipeline can effectively transfer advanced reasoning capabilities without relying on larger, more expensive teacher models.
arXiv Detail & Related papers (2025-09-25T12:22:44Z) - Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding [59.60915947702282]
Reinforcement learning with verifiable rewards (RLVR) has achieved remarkable success in enhancing the reasoning capabilities of large language models (LLMs)<n>Existing RLVR methods often suffer from exploration inefficiency due to mismatches between the training data's difficulty and the model's capability.<n>We propose SEELE, a novel supervision-aided RLVR framework that dynamically adjusts problem difficulty to stay within the high-efficiency region.
arXiv Detail & Related papers (2025-09-08T17:36:21Z) - SPARQ: Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms [4.569384312180347]
We present SPARQ: Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms.<n>We generate high-quality and diverse synthetic math problem and solution pairs using only a single model.<n>We show that filtering the generated data by difficulty and then fine-tuning the same model improves relative model performance by up to 24%.
arXiv Detail & Related papers (2025-06-06T19:49:42Z) - RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function Library [58.404895570822184]
RV-Syn is a novel mathematical Synthesis approach.<n>It generates graphs as solutions by combining Python-formatted functions from this library.<n>Based on the constructed graph, we achieve solution-guided logic-aware problem generation.
arXiv Detail & Related papers (2025-04-29T04:42:02Z) - Unleashing LLM Reasoning Capability via Scalable Question Synthesis from Scratch [54.12139707822201]
We propose ScaleQuest, a novel, scalable, and cost-effective data synthesis method.<n>By generating diverse questions from scratch, we produce a dataset of 1 million problem-solution pairs.<n>Our experiments demonstrate that models trained on our data outperform existing open-source datasets.
arXiv Detail & Related papers (2024-10-24T12:42:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.