MAmmoTH: Building Math Generalist Models through Hybrid Instruction
Tuning
- URL: http://arxiv.org/abs/2309.05653v3
- Date: Tue, 3 Oct 2023 02:48:42 GMT
- Title: MAmmoTH: Building Math Generalist Models through Hybrid Instruction
Tuning
- Authors: Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu
Su, Wenhu Chen
- Abstract summary: We introduce MAmmoTH, a series of open-source large language models (LLMs) specifically tailored for general math problem-solving.
The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset.
- Score: 60.208045804204076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce MAmmoTH, a series of open-source large language models (LLMs)
specifically tailored for general math problem-solving. The MAmmoTH models are
trained on MathInstruct, our meticulously curated instruction tuning dataset.
MathInstruct is compiled from 13 math datasets with intermediate rationales,
six of which have rationales newly curated by us. It presents a unique hybrid
of chain-of-thought (CoT) and program-of-thought (PoT) rationales, and also
ensures extensive coverage of diverse fields in math. The hybrid of CoT and PoT
not only unleashes the potential of tool use but also allows different thought
processes for different math problems. As a result, the MAmmoTH series
substantially outperform existing open-source models on nine mathematical
reasoning datasets across all scales with an average accuracy gain between 16%
and 32%. Remarkably, our MAmmoTH-7B model reaches 33% on MATH (a
competition-level dataset), which exceeds the best open-source 7B model
(WizardMath) by 23%, and the MAmmoTH-34B model achieves 44% accuracy on MATH,
even surpassing GPT-4's CoT result. Our work underscores the importance of
diverse problem coverage and the use of hybrid rationales in developing
superior math generalist models.
Related papers
- Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On [55.449818944278526]
We introduce the Skywork-Math model series, supervised fine-tuned (SFT) on common 7B language models.
Skywork-Math 7B has achieved impressive accuracies of 51.2% on the competition-level MATH benchmark.
We provide several practical takeaways to enhance math reasoning abilities in LLMs for both research and industry applications.
arXiv Detail & Related papers (2024-07-11T09:56:51Z) - Common 7B Language Models Already Possess Strong Math Capabilities [61.61442513067561]
This paper shows that the LLaMA-2 7B model with common pre-training already exhibits strong mathematical abilities.
The potential for extensive scaling is constrained by the scarcity of publicly available math questions.
arXiv Detail & Related papers (2024-03-07T18:00:40Z) - MathScale: Scaling Instruction Tuning for Mathematical Reasoning [70.89605383298331]
Large language models (LLMs) have demonstrated remarkable capabilities in problem-solving.
However, their proficiency in solving mathematical problems remains inadequate.
We propose MathScale, a simple and scalable method to create high-quality mathematical reasoning data.
arXiv Detail & Related papers (2024-03-05T11:42:59Z) - MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs [38.127313175508746]
MathGenie is a novel method for generating diverse and reliable math problems from a small-scale problem-solution dataset.
Various pretrained models, ranging from 7B to 70B, are trained on the newly curated data to test the effectiveness of the proposed augmentation technique.
MathGenieLM-InternLM2 achieves an accuracy of 87.7% on GSM8K and 55.7% on MATH, securing the best overall score among open-source language models.
arXiv Detail & Related papers (2024-02-26T07:17:25Z) - MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical
Reasoning [52.97768001837269]
We present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations.
We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions.
This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems.
arXiv Detail & Related papers (2023-10-05T17:52:09Z) - ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving [170.7899683843177]
ToRA is a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical problems.
ToRA models significantly outperform open-source models on 10 mathematical reasoning datasets across all scales.
ToRA-Code-34B is the first open-source model that achieves an accuracy exceeding 50% on MATH.
arXiv Detail & Related papers (2023-09-29T17:59:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.