OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
- URL: http://arxiv.org/abs/2402.10176v2
- Date: Sun, 03 Nov 2024 03:48:02 GMT
- Title: OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
- Authors: Shubham Toshniwal, Ivan Moshkov, Sean Narenthiran, Daria Gitman, Fei Jia, Igor Gitman,
- Abstract summary: We construct OpenMathInstruct-1, a math instruction tuning dataset with 1.8M problem-solution pairs.
The dataset is constructed by synthesizing code-interpreter solutions for GSM8K and MATH, two popular math reasoning benchmarks.
Our best model, OpenMath-CodeLlama-70B, trained on a subset of OpenMathInstruct-1, achieves a score of 84.6% on GSM8K and 50.7% on MATH.
- Score: 8.080585423915075
- License:
- Abstract: Recent work has shown the immense potential of synthetically generated datasets for training large language models (LLMs), especially for acquiring targeted skills. Current large-scale math instruction tuning datasets such as MetaMathQA (Yu et al., 2024) and MAmmoTH (Yue et al., 2024) are constructed using outputs from closed-source LLMs with commercially restrictive licenses. A key reason limiting the use of open-source LLMs in these data generation pipelines has been the wide gap between the mathematical skills of the best closed-source LLMs, such as GPT-4, and the best open-source LLMs. Building on the recent progress in open-source LLMs, our proposed prompting novelty, and some brute-force scaling, we construct OpenMathInstruct-1, a math instruction tuning dataset with 1.8M problem-solution pairs. The dataset is constructed by synthesizing code-interpreter solutions for GSM8K and MATH, two popular math reasoning benchmarks, using the recently released and permissively licensed Mixtral model. Our best model, OpenMath-CodeLlama-70B, trained on a subset of OpenMathInstruct-1, achieves a score of 84.6% on GSM8K and 50.7% on MATH, which is competitive with the best gpt-distilled models. We release our code, models, and the OpenMathInstruct-1 dataset under a commercially permissive license.
Related papers
- OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data [8.36384597713879]
The OpenMathInstruct-2 dataset consists of 14M question-solution pairs ($approx$ 600K unique questions)
Finetuning the textttLlama-3.1-8B-Base using OpenMathInstruct-2 outperforms textttLlama3.1-8B-Instruct on MATH by an absolute 15.9%.
To accelerate the open-source efforts, we release the code, the finetuned models, and the OpenMathInstruct-2 dataset under a commercially permissive license.
arXiv Detail & Related papers (2024-10-02T14:00:09Z) - InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning [58.7966588457529]
InfiMM-WebMath-40B is a high-quality dataset of interleaved image-text documents.
It comprises 24 million web pages, 85 million associated image URLs, and 40 billion text tokens, all meticulously extracted and filtered from CommonCrawl.
Our evaluations on text-only benchmarks show that, despite utilizing only 40 billion tokens, our dataset significantly enhances the performance of our 1.3B model.
Our models set a new state-of-the-art among open-source models on multi-modal math benchmarks such as MathVerse and We-Math.
arXiv Detail & Related papers (2024-09-19T08:41:21Z) - Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On [55.449818944278526]
We introduce the Skywork-Math model series, supervised fine-tuned (SFT) on common 7B language models.
Skywork-Math 7B has achieved impressive accuracies of 51.2% on the competition-level MATH benchmark.
We provide several practical takeaways to enhance math reasoning abilities in LLMs for both research and industry applications.
arXiv Detail & Related papers (2024-07-11T09:56:51Z) - DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning [24.68321102981711]
We introduce a series of large language models (LLMs) that employ the Decomposition of thought with code assistance and self-correction for mathematical reasoning, dubbed as DotaMath.
DotaMath models tackle complex mathematical tasks by decomposing them into simpler logical subtasks, leveraging code to solve these subtasks, and engaging in self-reflection and correction.
We train a series of base LLMs using imitation learning on DotaMathQA, resulting in DotaMath models that achieve remarkable performance compared to open-source LLMs.
arXiv Detail & Related papers (2024-07-04T17:39:16Z) - Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models [62.815222721144636]
We introduce Math-LLaVA, a LLaVA-1.5-based model fine-tuned with MathV360K.
This novel approach significantly improves the multimodal mathematical reasoning capabilities of LLaVA-1.5.
Math-LLaVA demonstrates enhanced generalizability, showing substantial improvements on the MMMU benchmark.
arXiv Detail & Related papers (2024-06-25T05:43:21Z) - MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning [11.426127461122908]
This work includes new math questions via multi-perspective data augmenting methods and then synthesize code-nested solutions to them.
Open Large Language Models (LLMs) that integrate with external Python interpreters have significantly enhanced mathematical reasoning capabilities.
We propose a two-stage training strategy: In Stage-1, we finetune Llama-2 on pure CoT data to get an intermediate model, which then is trained on the code-nested data in Stage-2 to get the resulting MuMath-Code.
arXiv Detail & Related papers (2024-05-13T08:32:19Z) - MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible
Pipeline [12.186691561822256]
We postulate that the inherent nature of large language models (LLMs) presents challenges in modeling mathematical reasoning.
This paper introduces a novel math dataset, enhanced with a capability to utilize a Python code interpreter.
We propose a tentative, easily replicable protocol for the fine-tuning of math-specific LLMs.
arXiv Detail & Related papers (2024-01-16T08:08:01Z) - MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning [54.2093509928664]
In math reasoning with large language models, fine-tuning data augmentation by query evolution and diverse reasoning paths is empirically verified effective.
We conduct an investigation for such data augmentation in math reasoning and are intended to answer these questions.
We release our codes and augmented data in https://github.com/OFA-Sys/8k-Scel.
arXiv Detail & Related papers (2023-10-09T08:18:58Z) - MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical
Reasoning [52.97768001837269]
We present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations.
We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions.
This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems.
arXiv Detail & Related papers (2023-10-05T17:52:09Z) - MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models [91.66694225955872]
We propose MetaMath, a fine-tuned language model that specializes in mathematical reasoning.
Specifically, we start by bootstrapping mathematical questions by rewriting the question from multiple perspectives without extra knowledge.
We release all the MetaMathQA dataset, the MetaMath models with different model sizes and the training code for public use.
arXiv Detail & Related papers (2023-09-21T17:45:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.