Measuring Mathematical Problem Solving With the MATH Dataset
- URL: http://arxiv.org/abs/2103.03874v1
- Date: Fri, 5 Mar 2021 18:59:39 GMT
- Title: Measuring Mathematical Problem Solving With the MATH Dataset
- Authors: Dan Hendrycks and Collin Burns and Saurav Kadavath and Akul Arora and
Steven Basart and Eric Tang and Dawn Song and Jacob Steinhardt
- Abstract summary: We introduce MATH, a dataset of 12,500 challenging competition mathematics problems.
Each problem has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations.
We also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics.
- Score: 55.4376028963537
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many intellectual endeavors require mathematical problem solving, but this
skill remains beyond the capabilities of computers. To measure this ability in
machine learning models, we introduce MATH, a new dataset of 12,500 challenging
competition mathematics problems. Each problem in MATH has a full step-by-step
solution which can be used to teach models to generate answer derivations and
explanations. To facilitate future research and increase accuracy on MATH, we
also contribute a large auxiliary pretraining dataset which helps teach models
the fundamentals of mathematics. Even though we are able to increase accuracy
on MATH, our results show that accuracy remains relatively low, even with
enormous Transformer models. Moreover, we find that simply increasing budgets
and model parameter counts will be impractical for achieving strong
mathematical reasoning if scaling trends continue. While scaling Transformers
is automatically solving most other text-based tasks, scaling is not currently
solving MATH. To have more traction on mathematical problem solving we will
likely need new algorithmic advancements from the broader research community.
Related papers
- MathLearner: A Large Language Model Agent Framework for Learning to Solve Mathematical Problems [0.936726079405677]
We propose an agent framework for learning to solve mathematical problems based on inductive reasoning.
By emulating the human learning process of generalization of learned information, this framework has great performance in the mathematical reasoning process.
Our model can be used as a personalised learning aid, thus reducing the inequality of educational resources.
arXiv Detail & Related papers (2024-08-03T13:28:19Z) - AI-Assisted Generation of Difficult Math Questions [78.7547836422727]
Current training positions mathematical reasoning as a core capability.
There is unmet demand for diverse and challenging math questions.
We present a design framework that combines the strengths of LLMs with a human-in-the-loop approach.
arXiv Detail & Related papers (2024-07-30T17:55:36Z) - Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks [34.09857430966818]
We introduce an extensive mathematics dataset called "MathQuest" sourced from the 11th and 12th standard Mathematics NCERT textbooks.
We conduct fine-tuning experiments with three prominent large language models: LLaMA-2, WizardMath, and MAmmoTH.
Our experiments reveal that among the three models, MAmmoTH-13B emerges as the most proficient, achieving the highest level of competence in solving the presented mathematical problems.
arXiv Detail & Related papers (2024-04-19T08:45:42Z) - MathScale: Scaling Instruction Tuning for Mathematical Reasoning [70.89605383298331]
Large language models (LLMs) have demonstrated remarkable capabilities in problem-solving.
However, their proficiency in solving mathematical problems remains inadequate.
We propose MathScale, a simple and scalable method to create high-quality mathematical reasoning data.
arXiv Detail & Related papers (2024-03-05T11:42:59Z) - MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning [2.9104279358536647]
We present MathSensei, a tool-augmented large language model for mathematical reasoning.
We study the complementary benefits of the tools - knowledge retriever (Bing Web Search), program generator + executor (Python), and symbolic equation solver (Wolfram-Alpha API)
arXiv Detail & Related papers (2024-02-27T05:50:35Z) - ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving [170.7899683843177]
ToRA is a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical problems.
ToRA models significantly outperform open-source models on 10 mathematical reasoning datasets across all scales.
ToRA-Code-34B is the first open-source model that achieves an accuracy exceeding 50% on MATH.
arXiv Detail & Related papers (2023-09-29T17:59:38Z) - A Survey of Deep Learning for Mathematical Reasoning [71.88150173381153]
We review the key tasks, datasets, and methods at the intersection of mathematical reasoning and deep learning over the past decade.
Recent advances in large-scale neural language models have opened up new benchmarks and opportunities to use deep learning for mathematical reasoning.
arXiv Detail & Related papers (2022-12-20T18:46:16Z) - JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem
Understanding [74.12405417718054]
This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model(PLM)
Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement.
We design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses.
arXiv Detail & Related papers (2022-06-13T17:03:52Z) - Reverse Operation based Data Augmentation for Solving Math Word Problems [37.26159426631031]
Recent models have reached their performance bottleneck and require more high-quality data for training.
We propose a novel data augmentation method that reverses the mathematical logic of math word problems.
We apply the augmented data on two SOTA math word problem solving models and compare our results with a strong data augmentation baseline.
arXiv Detail & Related papers (2020-10-04T11:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.