Related papers: MathMixup: Boosting LLM Mathematical Reasoning with Difficulty-Controllable Data Synthesis and Curriculum Learning

MathMixup: Boosting LLM Mathematical Reasoning with Difficulty-Controllable Data Synthesis and Curriculum Learning

URL: http://arxiv.org/abs/2601.17006v1
Date: Wed, 14 Jan 2026 07:28:42 GMT
Title: MathMixup: Boosting LLM Mathematical Reasoning with Difficulty-Controllable Data Synthesis and Curriculum Learning
Authors: Xuchen Li, Jing Chen, Xuzhao Li, Hao Liang, Xiaohuan Zhou, Taifeng Wang, Wentao Zhang,
Abstract summary: MathMixup is a novel data synthesis paradigm that generates high-quality, difficulty-controllable mathematical reasoning problems.<n>We show that MathMixup and its curriculum learning strategy significantly enhance the mathematical reasoning performance of Large Language Models.
Score: 17.497429897140695
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In mathematical reasoning tasks, the advancement of Large Language Models (LLMs) relies heavily on high-quality training data with clearly defined and well-graded difficulty levels. However, existing data synthesis methods often suffer from limited diversity and lack precise control over problem difficulty, making them insufficient for supporting efficient training paradigms such as curriculum learning. To address these challenges, we propose MathMixup, a novel data synthesis paradigm that systematically generates high-quality, difficulty-controllable mathematical reasoning problems through hybrid and decomposed strategies. Automated self-checking and manual screening are incorporated to ensure semantic clarity and a well-structured difficulty gradient in the synthesized data. Building on this, we construct the MathMixupQA dataset and design a curriculum learning strategy that leverages these graded problems, supporting flexible integration with other datasets. Experimental results show that MathMixup and its curriculum learning strategy significantly enhance the mathematical reasoning performance of LLMs. Fine-tuned Qwen2.5-7B achieves an average score of 52.6\% across seven mathematical benchmarks, surpassing previous state-of-the-art methods. These results fully validate the effectiveness and broad applicability of MathMixup in improving the mathematical reasoning abilities of LLMs and advancing data-centric curriculum learning.

Related papers

REAMS: Reasoning Enhanced Algorithm for Maths Solving [1.7023742022011212]
We introduce a language-based solution that leverages zero-shot learning and mathematical reasoning to solve, explain, and generate solutions for advanced math problems.<n>Our method achieves an accuracy of 90.15%, representing a substantial improvement over the previous benchmark of 81% and setting a new standard in automated mathematical problem-solving.
arXiv Detail & Related papers (2025-09-16T21:09:48Z)
MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy [47.27578055313302]
MathSmith is a novel framework for challenging mathematical problems to enhance LLM reasoning.<n>Rather than modifying existing problems, MathSmith constructs new ones from scratch by randomly sampling concept-explanation pairs from PlanetMath.<n>To increase difficulty, we design nine predefined strategies as soft constraints during rationales.<n>Experiments show MathSmith consistently outperforms existing baselines under both short and long CoT settings.
arXiv Detail & Related papers (2025-08-07T17:32:14Z)
WarriorMath: Enhancing the Mathematical Ability of Large Language Models with a Defect-aware Framework [42.74246647841103]
WarriorMath is a defect-aware framework for mathematical problem solving.<n>We employ multiple expert LLMs in a collaborative process to generate, critique, and refine problems.<n>In the training stage, we introduce a progressive learning framework that iteratively fine-tunes the model using increasingly challenging data tailored to its weaknesses.
arXiv Detail & Related papers (2025-08-02T07:45:12Z)
HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class [27.93059568425132]
HARDMath2 is a dataset of 211 original problems covering the core topics in a graduate applied math class.<n>This dataset was designed and verified by the students and instructors of a core graduate applied mathematics course at Harvard.<n>We build the dataset through a novel collaborative environment that challenges students to write and refine difficult problems consistent with the class syllabus.
arXiv Detail & Related papers (2025-05-17T00:52:49Z)
Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models [86.45058529521258]
OlymMATH is a novel Olympiad-level mathematical benchmark designed to rigorously test the complex reasoning capabilities of LLMs.<n>OlymMATH features 200 meticulously curated problems, each manually verified and available in parallel English and Chinese versions.
arXiv Detail & Related papers (2025-03-27T11:20:17Z)
MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion [48.443460251524776]
MathFusion is a novel framework that enhances mathematical reasoning through cross-problem instruction synthesis.<n>MathFusion achieves substantial improvements in mathematical reasoning while maintaining high data efficiency.
arXiv Detail & Related papers (2025-03-20T15:00:41Z)
Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages [13.377908992869814]
Problem-solving data significantly enhances the model's mathematical capabilities compared to general mathematical corpora.<n>We identify effective data synthesis methods, demonstrating that the tutorship amplification synthesis method achieves the best performance.
arXiv Detail & Related papers (2025-01-23T12:14:57Z)
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z)
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data [20.31528845718877]
Large language models (LLMs) have significantly advanced natural language understanding and demonstrated strong problem-solving abilities. This paper investigates the mathematical problem-solving capabilities of LLMs using the newly developed "MathOdyssey" dataset.
arXiv Detail & Related papers (2024-06-26T13:02:35Z)
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models. It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths. It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z)
JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding [74.12405417718054]
This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model(PLM) Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement. We design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses.
arXiv Detail & Related papers (2022-06-13T17:03:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.