Related papers: CoinMath: Harnessing the Power of Coding Instruction for Math LLMs

CoinMath: Harnessing the Power of Coding Instruction for Math LLMs

URL: http://arxiv.org/abs/2412.11699v1
Date: Mon, 16 Dec 2024 12:21:11 GMT
Title: CoinMath: Harnessing the Power of Coding Instruction for Math LLMs
Authors: Chengwei Wei, Bin Wang, Jung-jae Kim, Guimei Liu, Nancy F. Chen,
Abstract summary: Large Language Models (LLMs) have shown strong performance in solving mathematical problems.<n>Best practice to leverage coding instruction data to enhance mathematical reasoning remains underexplored.<n> CoinMath generates a variety of code-based rationales incorporating concise comments, descriptive naming conventions, and hardcoded solutions.
Score: 34.07259769892295
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have shown strong performance in solving mathematical problems, with code-based solutions proving particularly effective. However, the best practice to leverage coding instruction data to enhance mathematical reasoning remains underexplored. This study investigates three key questions: (1) How do different coding styles of mathematical code-based rationales impact LLMs' learning performance? (2) Can general-domain coding instructions improve performance? (3) How does integrating textual rationales with code-based ones during training enhance mathematical reasoning abilities? Our findings reveal that code-based rationales with concise comments, descriptive naming, and hardcoded solutions are beneficial, while improvements from general-domain coding instructions and textual rationales are relatively minor. Based on these insights, we propose CoinMath, a learning strategy designed to enhance mathematical reasoning by diversifying the coding styles of code-based rationales. CoinMath generates a variety of code-based rationales incorporating concise comments, descriptive naming conventions, and hardcoded solutions. Experimental results demonstrate that CoinMath significantly outperforms its baseline model, MAmmoTH, one of the SOTA math LLMs.

Related papers

ClozeMath: Improving Mathematical Reasoning in Language Models by Learning to Fill Equations [29.51572057789961]
We propose a new approach named ClozeMath to fine-tune large language models for mathematical reasoning.<n>Our ClozeMath involves a text-infilling task that predicts masked equations from a given solution, analogous to cloze exercises used in human learning.<n> Experiments on GSM8K, MATH, and GSM-Symbolic show that ClozeMath surpasses the strong baseline Masked Thought in performance and robustness.
arXiv Detail & Related papers (2025-06-04T09:27:21Z)
Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions [8.540135660509058]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities in math and coding.<n>We leverage influence functions to attribute LLMs' reasoning ability on math and coding to individual training examples, sequences, and tokens.<n>High-difficulty math examples improve both math and code reasoning, while low-difficulty code tasks most effectively benefit code reasoning.
arXiv Detail & Related papers (2025-05-26T13:15:26Z)
MegaMath: Pushing the Limits of Open Math Corpora [44.148011362359036]
We present MegaMath, an open dataset curated from diverse, math-focused sources. MegaMath delivers 371B tokens with the largest quantity and top quality among existing open math pre-training datasets.
arXiv Detail & Related papers (2025-04-03T17:52:07Z)
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code [38.127313175508746]
We introduce a novel method for generating mathematical code accompanied with corresponding reasoning steps for continued pretraining. Our approach begins with the construction of a high-quality mathematical continued pretraining dataset. Appending the generated code to each reasoning step results in data consisting of paired natural language reasoning steps and their corresponding code.
arXiv Detail & Related papers (2024-10-10T17:58:40Z)
INC-Math: Integrating Natural Language and Code for Enhanced Mathematical Reasoning in Large Language Models [21.082464220284127]
We explore fundamental questions related to solving mathematical reasoning problems using natural language and code. Our findings show that LLMs are better at reasoning in natural language compared to code. Although natural language and code serve as complementary forms of reasoning, they can affect each other in a negative way in certain scenarios.
arXiv Detail & Related papers (2024-09-28T15:12:55Z)
FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models [44.63505885248145]
FineMath is a fine-grained mathematical evaluation benchmark dataset for assessing Chinese Large Language Models (LLMs) FineMath is created to cover the major key mathematical concepts taught in elementary school math, which are divided into 17 categories of math word problems. All the 17 categories of math word problems are manually annotated with their difficulty levels according to the number of reasoning steps required to solve these problems.
arXiv Detail & Related papers (2024-03-12T15:32:39Z)
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers [68.77382332826167]
Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks. One essential and frequently occurring evidence is that when the math questions are slightly changed, LLMs can behave incorrectly. This motivates us to evaluate the robustness of LLMs' math reasoning capability by testing a wide range of question variations.
arXiv Detail & Related papers (2024-02-29T15:26:14Z)
ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models [67.32868432113587]
This paper introduces ConceptMath, a fine-grained benchmark that evaluates concept-wise mathematical reasoning of Large Language Models (LLMs) Unlike traditional benchmarks that evaluate general mathematical reasoning with an average accuracy, ConceptMath systematically organizes math problems under a hierarchy of math concepts.
arXiv Detail & Related papers (2024-02-22T16:06:49Z)
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning [98.53491178426492]
We open-source our math reasoning LLMs InternLM-Math which is continue pre-trained from InternLM2. We unify chain-of-thought reasoning, reward modeling, formal reasoning, data augmentation, and code interpreter in a unified seq2seq format. Our pre-trained model achieves 30.3 on the MiniF2F test set without fine-tuning.
arXiv Detail & Related papers (2024-02-09T11:22:08Z)
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning [52.97768001837269]
We present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations. We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions. This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems.
arXiv Detail & Related papers (2023-10-05T17:52:09Z)
JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding [74.12405417718054]
This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model(PLM) Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement. We design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses.
arXiv Detail & Related papers (2022-06-13T17:03:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.