GPT Can Solve Mathematical Problems Without a Calculator
- URL: http://arxiv.org/abs/2309.03241v2
- Date: Tue, 12 Sep 2023 11:01:25 GMT
- Title: GPT Can Solve Mathematical Problems Without a Calculator
- Authors: Zhen Yang, Ming Ding, Qingsong Lv, Zhihuan Jiang, Zehai He, Yuyi Guo,
Jinfeng Bai, Jie Tang
- Abstract summary: We show that a large language model can accurately perform arithmetic operations with almost 100% accuracy without data leakage.
We also demonstrate that our MathGLM, fine-tuned from GLM-10B, achieves similar performance to GPT-4 on a 5,000-samples Chinese math problem test set.
- Score: 24.114064917059565
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Previous studies have typically assumed that large language models are unable
to accurately perform arithmetic operations, particularly multiplication of >8
digits, and operations involving decimals and fractions, without the use of
calculator tools. This paper aims to challenge this misconception. With
sufficient training data, a 2 billion-parameter language model can accurately
perform multi-digit arithmetic operations with almost 100% accuracy without
data leakage, significantly surpassing GPT-4 (whose multi-digit multiplication
accuracy is only 4.3%). We also demonstrate that our MathGLM, fine-tuned from
GLM-10B on a dataset with additional multi-step arithmetic operations and math
problems described in text, achieves similar performance to GPT-4 on a
5,000-samples Chinese math problem test set. Our code and data are public at
https://github.com/THUDM/MathGLM.
Related papers
- Language Models Do Hard Arithmetic Tasks Easily and Hardly Do Easy Arithmetic Tasks [27.020990219204343]
Large language models (LLMs) can correctly and confidently predict the first digit of n-digit by m-digit multiplication tasks.
LLMs in practice often fail to correctly or confidently predict the last digit of an n-digit by m-digit multiplication.
We show that the latter task can be solved more robustly when the LLM is conditioned on all of the correct higher-order digits.
arXiv Detail & Related papers (2024-06-04T14:34:39Z) - OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step [7.7168728919692855]
We propose a framework that enables exact arithmetic in a single autoregressive step.
We use the hidden states of a LLM to control a symbolic architecture that performs arithmetic.
Our implementation using Llama 3 with OccamNet as a symbolic model (OccamLlama) achieves 100% accuracy on single arithmetic operations.
arXiv Detail & Related papers (2024-06-04T04:17:40Z) - Common 7B Language Models Already Possess Strong Math Capabilities [61.61442513067561]
This paper shows that the LLaMA-2 7B model with common pre-training already exhibits strong mathematical abilities.
The potential for extensive scaling is constrained by the scarcity of publicly available math questions.
arXiv Detail & Related papers (2024-03-07T18:00:40Z) - MathScale: Scaling Instruction Tuning for Mathematical Reasoning [70.89605383298331]
Large language models (LLMs) have demonstrated remarkable capabilities in problem-solving.
However, their proficiency in solving mathematical problems remains inadequate.
We propose MathScale, a simple and scalable method to create high-quality mathematical reasoning data.
arXiv Detail & Related papers (2024-03-05T11:42:59Z) - Positional Description Matters for Transformers Arithmetic [58.4739272381373]
Transformers often falter on arithmetic tasks despite their vast capabilities.
We propose several ways to fix the issue, either by modifying the positional encoding directly, or by modifying the representation of the arithmetic task to leverage standard positional encoding differently.
arXiv Detail & Related papers (2023-11-22T00:31:01Z) - Solving the multiplication problem of a large language model system
using a graph-based method [20.43440908151311]
ChatGPT possesses excellent natural language processing capabilities but is inadequate for solving arithmetic problems.
We developed a graph-based multiplication algorithm that emulated human-like numerical operations.
Our proposed algorithm attained 100% accuracy for 1,000,000 large number multiplication tasks.
arXiv Detail & Related papers (2023-10-18T08:02:00Z) - MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical
Reasoning [52.97768001837269]
We present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations.
We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions.
This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems.
arXiv Detail & Related papers (2023-10-05T17:52:09Z) - MAmmoTH: Building Math Generalist Models through Hybrid Instruction
Tuning [60.208045804204076]
We introduce MAmmoTH, a series of open-source large language models (LLMs) specifically tailored for general math problem-solving.
The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset.
arXiv Detail & Related papers (2023-09-11T17:47:22Z) - WizardMath: Empowering Mathematical Reasoning for Large Language Models
via Reinforced Evol-Instruct [128.89645483139236]
We present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math.
Our model even surpasses ChatGPT-3.5, Claude Instant-1, PaLM-2 and Minerva on GSM8k, simultaneously surpasses Text-davinci, PaLM-1 and GPT-3 on MATH.
arXiv Detail & Related papers (2023-08-18T14:23:21Z) - How well do Large Language Models perform in Arithmetic tasks? [25.638682874990206]
Large language models have emerged abilities including chain-of-thought to answer math word problems step by step.
To the best of our knowledge, there is no work to focus on evaluating the arithmetic ability of large language models.
In this work, we propose an arithmetic dataset MATH 401 to test the latest large language models.
arXiv Detail & Related papers (2023-03-16T09:28:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.