Related papers: Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages

Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages

URL: http://arxiv.org/abs/2501.14002v3
Date: Mon, 24 Mar 2025 02:20:01 GMT
Title: Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages
Authors: Zui Chen, Tianqiao Liu, Mi Tian, Qing Tong, Weiqi Luo, Zitao Liu,
Abstract summary: Problem-solving data significantly enhances the model's mathematical capabilities compared to general mathematical corpora.<n>We identify effective data synthesis methods, demonstrating that the tutorship amplification synthesis method achieves the best performance.
Score: 13.377908992869814
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mathematical reasoning remains a challenging area for large language models (LLMs), prompting the development of math-specific LLMs such as LLEMMA, DeepSeekMath, and Qwen2-Math, among others. These models typically follow a two-stage training paradigm: pre-training with math-related corpora and post-training with problem datasets for supervised fine-tuning (SFT). Despite these efforts, the improvements in mathematical reasoning achieved through continued pre-training (CPT) are often less significant compared to those obtained via SFT. This study addresses this discrepancy by exploring alternative strategies during the pre-training phase, focusing on the use of problem-solving data over general mathematical corpora. We investigate three primary research questions: (1) Can problem-solving data enhance the model's mathematical reasoning capabilities more effectively than general mathematical corpora during CPT? (2) Are synthetic data from the same source equally effective, and which synthesis methods are most efficient? (3) How do the capabilities developed from the same problem-solving data differ between the CPT and SFT stages, and what factors contribute to these differences? Our findings indicate that problem-solving data significantly enhances the model's mathematical capabilities compared to general mathematical corpora. We also identify effective data synthesis methods, demonstrating that the tutorship amplification synthesis method achieves the best performance. Furthermore, while SFT facilitates instruction-following abilities, it underperforms compared to CPT with the same data, which can be partially attributed to its poor learning capacity for more challenging problem-solving data. These insights provide valuable guidance for optimizing the mathematical reasoning capabilities of LLMs, culminating in our development of a powerful mathematical base model called MathGPT-8B.

Related papers

RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function Library [58.404895570822184]
RV-Syn is a novel mathematical Synthesis approach. It generates graphs as solutions by combining Python-formatted functions from this library. Based on the constructed graph, we achieve solution-guided logic-aware problem generation.
arXiv Detail & Related papers (2025-04-29T04:42:02Z)
MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion [48.443460251524776]
MathFusion is a novel framework that enhances mathematical reasoning through cross-problem instruction synthesis. MathFusion achieves substantial improvements in mathematical reasoning while maintaining high data efficiency.
arXiv Detail & Related papers (2025-03-20T15:00:41Z)
Integrating Arithmetic Learning Improves Mathematical Reasoning in Smaller Models [0.0]
Large models pre-trained on high-quality data exhibit excellent performance across various reasoning tasks. Smaller student models learn from teacher models, and data augmentation, such as rephrasing questions. Despite these efforts, smaller models struggle with arithmetic computations, leading to errors in mathematical reasoning.
arXiv Detail & Related papers (2025-02-18T13:43:06Z)
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task [49.355810887265925]
We introduce MathFimer, a novel framework for mathematical reasoning step expansion. We develop a specialized model, MathFimer-7B, on our carefully curated NuminaMath-FIM dataset. We then apply these models to enhance existing mathematical reasoning datasets by inserting detailed intermediate steps into their solution chains.
arXiv Detail & Related papers (2025-02-17T11:22:24Z)
Combining physics-based and data-driven models: advancing the frontiers of research with Scientific Machine Learning [3.912796219404492]
SciML is a research field which combines physics-based and data-driven models. Data-driven models aim to extract relations between input and output data. We discuss the successful application of SciML to the simulation of the human cardiac function.
arXiv Detail & Related papers (2025-01-30T19:09:38Z)
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z)
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models. It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths. It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z)
Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks [34.09857430966818]
We introduce an extensive mathematics dataset called "MathQuest" sourced from the 11th and 12th standard Mathematics NCERT textbooks. We conduct fine-tuning experiments with three prominent large language models: LLaMA-2, WizardMath, and MAmmoTH. Our experiments reveal that among the three models, MAmmoTH-13B emerges as the most proficient, achieving the highest level of competence in solving the presented mathematical problems.
arXiv Detail & Related papers (2024-04-19T08:45:42Z)
SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models [4.090307917818891]
We focus on integrating the Chain-of-Thought (CoT) and the Program-of-Thought (PoT) learning. We propose a sequential learning approach, named SAAS (Solving Ability Amplification Strategy), which strategically transitions from CoT learning to PoT learning.
arXiv Detail & Related papers (2024-04-05T04:25:47Z)
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning [110.80663974060624]
Key-Point-Driven Data Synthesis (KPDDS) is a novel data synthesis framework that synthesizes question-answer pairs. KPDDS ensures the generation of novel questions with rigorous quality control and substantial scalability. We present KPMath, an extensive synthetic dataset tailored for mathematical reasoning, comprising over 800K question-answer pairs.
arXiv Detail & Related papers (2024-03-04T18:58:30Z)
Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks. Data augmentation induces these symmetries during training by applying multiple transformations to the input data. This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.