Related papers: A self-evolving multi-role collaborative framework with fine-grained difficulty guidance for innovative mathematical problem generation

A self-evolving multi-role collaborative framework with fine-grained difficulty guidance for innovative mathematical problem generation

URL: http://arxiv.org/abs/2601.11792v1
Date: Fri, 16 Jan 2026 21:36:04 GMT
Title: A self-evolving multi-role collaborative framework with fine-grained difficulty guidance for innovative mathematical problem generation
Authors: Yifei Sun, Yongan Li, A. K. Qin, Sicheng Hou, Tamas Pflanzner,
Abstract summary: We propose the task of innovative math problem generation (IMPG)<n>This paper proposes a self-evolving, multi-role collaborative framework with fine-grained difficulty guidance.<n> Experiments show that, compared to baseline models, our proposed method significantly improves the innovation of the generated problems.
Score: 3.4082981066509928
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mathematical problem generation (MPG) is a significant research direction in the field of intelligent education. In recent years, the rapid development of large language models (LLMs) has enabled new technological approaches to problem-generation tasks. Although existing LLMs can achieve high correctness rates, they generally lack innovation and exhibit poor discrimination. In this paper, we propose the task of innovative math problem generation (IMPG). To solve the IMPG task, this paper proposes a self-evolving, multi-role collaborative framework with fine-grained difficulty guidance. First, a multi-role collaborative mechanism comprising a sampler, generator, evaluator, state machine, and memory is constructed, ensuring the correctness of generated problems through iterative optimization informed by self-assessment and external feedback. Second, we introduce an improved difficulty model to quantify difficulty and provide fine-grained guidance. We adopt the data-driven association-guided path sampling (DAPS) algorithm to enhance the semantic rationality of sampled encodings. Third, we construct the HSM3K-CN dataset, which comprises high-quality high school math problems. A multi-stage training pipeline is adopted, incorporating continual pre-training (CPT), supervised fine-tuning (SFT), and group relative policy optimization (GRPO), to enhance the generation and evaluation capabilities of the base model. Finally, system self-evolution is achieved by transferring evaluation capabilities from the expert model to the apprentice model via distillation. Experiments show that, compared to baseline models, our proposed method significantly improves the innovation of the generated problems while maintaining a high correctness rate.

Related papers

Dual-Phase LLM Reasoning: Self-Evolved Mathematical Frameworks [48.105258051884384]
This paper proposes a new two-stage training framework that enhances models' self-correction capabilities.<n>During the first stage, a multi-turn dialogue strategy guides the model to generate long chain-of-thought (CoT) data.<n>The second stage employs a difficulty-aware rejection sampling mechanism to dynamically optimize data distribution.
arXiv Detail & Related papers (2026-01-09T08:19:11Z)
Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis for Large Reasoning Models [54.29243291958429]
We develop a problem generator that reasons explicitly to plan problem directions before synthesis.<n>We treat the solver's feedback on synthetic problems as a reward signal, enabling the generator to calibrate difficulty.<n>Our method achieves an average improvement of 2.5% and generalizes to both language and vision-language models.
arXiv Detail & Related papers (2025-11-13T03:08:51Z)
A Survey on Generative Recommendation: Data, Model, and Tasks [55.36322811257545]
generative recommendation reconceptualizes recommendation as a generation task rather than discriminative scoring.<n>This survey provides a comprehensive examination through a unified tripartite framework spanning data, model, and task dimensions.<n>We identify five key advantages: world knowledge integration, natural language understanding, reasoning capabilities, scaling laws, and creative generation.
arXiv Detail & Related papers (2025-10-31T04:02:58Z)
Experience-Guided Reflective Co-Evolution of Prompts and Heuristics for Automatic Algorithm Design [124.54166764570972]
Combinatorial optimization problems are traditionally tackled with handcrafted algorithms.<n>Recent progress has highlighted the potential of automatics design powered by large language models.<n>We propose the Experience-Evolution Reflective Co-Guided of Prompt and Heuristics (EvoPH) for automatic algorithm design.
arXiv Detail & Related papers (2025-09-29T09:24:09Z)
Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition [89.50068130832635]
Self-Improving cognition (SIcog) is a self-learning framework for constructing next-generation foundation MLLMs by multimodal knowledge.<n>We propose Chain-of-Description for step-by-step visual understanding and integrate structured Chain-of-Thought (CoT) reasoning to support in-depth multimodal reasoning.<n>Experiments demonstrate SIcog's effectiveness in developing MLLMs with enhanced multimodal cognition.
arXiv Detail & Related papers (2025-03-16T00:25:13Z)
Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models [17.673293240849787]
We introduce SPHERE, a self-evolving data generation pipeline that enhances reasoning in small language models (SLMs)<n> SPHERE operates in three stages: (i) Self-Generation, where the model autonomously constructs problem-solving steps; (ii) Self-Correction, enabling it to identify and rectify errors; and (iii) Diversity Induction, improving robustness through multiple valid reasoning trajectories.<n>We show that SPHERE-trained models achieve significant gains over their base versions and match/surpass GPT-4o on certain benchmarks.
arXiv Detail & Related papers (2025-03-04T14:43:25Z)
SRA-MCTS: Self-driven Reasoning Augmentation with Monte Carlo Tree Search for Code Generation [14.786100203787194]
Large language models demonstrate exceptional performance in simple code generation tasks but face challenges in tackling complex problems.<n>We propose a reasoning-augmented data generation process, SRA-MCTS, which guides the model to autonomously generate high-quality intermediate reasoning paths.<n>Our method operates entirely through the model itself without requiring additional supervision.
arXiv Detail & Related papers (2024-11-17T12:31:04Z)
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.