Related papers: GPT Can Solve Mathematical Problems Without a Calculator

GPT Can Solve Mathematical Problems Without a Calculator

URL: http://arxiv.org/abs/2309.03241v2
Date: Tue, 12 Sep 2023 11:01:25 GMT
Title: GPT Can Solve Mathematical Problems Without a Calculator
Authors: Zhen Yang, Ming Ding, Qingsong Lv, Zhihuan Jiang, Zehai He, Yuyi Guo, Jinfeng Bai, Jie Tang
Abstract summary: We show that a large language model can accurately perform arithmetic operations with almost 100% accuracy without data leakage. We also demonstrate that our MathGLM, fine-tuned from GLM-10B, achieves similar performance to GPT-4 on a 5,000-samples Chinese math problem test set.
Score: 24.114064917059565
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Previous studies have typically assumed that large language models are unable to accurately perform arithmetic operations, particularly multiplication of >8 digits, and operations involving decimals and fractions, without the use of calculator tools. This paper aims to challenge this misconception. With sufficient training data, a 2 billion-parameter language model can accurately perform multi-digit arithmetic operations with almost 100% accuracy without data leakage, significantly surpassing GPT-4 (whose multi-digit multiplication accuracy is only 4.3%). We also demonstrate that our MathGLM, fine-tuned from GLM-10B on a dataset with additional multi-step arithmetic operations and math problems described in text, achieves similar performance to GPT-4 on a 5,000-samples Chinese math problem test set. Our code and data are public at https://github.com/THUDM/MathGLM.

Related papers

IGC: Integrating a Gated Calculator into an LLM to Solve Arithmetic Tasks Reliably and Efficiently [17.525220958618988]
We introduce the Integrated Gated Calculator (IGC), a module that enables Large Language Models to perform arithmetic by emulating a calculator on the GPU. We finetune a Llama model with our module and test it on the BigBench Arithmetic benchmark, where it beats the State of the Art. Our approach takes only a single iteration to run and requires no external tools.
arXiv Detail & Related papers (2025-01-01T00:01:27Z)
Language Models Do Hard Arithmetic Tasks Easily and Hardly Do Easy Arithmetic Tasks [27.020990219204343]
Large language models (LLMs) can correctly and confidently predict the first digit of n-digit by m-digit multiplication tasks. LLMs in practice often fail to correctly or confidently predict the last digit of an n-digit by m-digit multiplication. We show that the latter task can be solved more robustly when the LLM is conditioned on all of the correct higher-order digits.
arXiv Detail & Related papers (2024-06-04T14:34:39Z)
OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step [7.7168728919692855]
We propose a framework that enables exact arithmetic in a single autoregressive step. We use the hidden states of a LLM to control a symbolic architecture that performs arithmetic. Our implementation using Llama 3 with OccamNet as a symbolic model (OccamLlama) achieves 100% accuracy on single arithmetic operations.
arXiv Detail & Related papers (2024-06-04T04:17:40Z)
Common 7B Language Models Already Possess Strong Math Capabilities [61.61442513067561]
This paper shows that the LLaMA-2 7B model with common pre-training already exhibits strong mathematical abilities. The potential for extensive scaling is constrained by the scarcity of publicly available math questions.
arXiv Detail & Related papers (2024-03-07T18:00:40Z)
MathScale: Scaling Instruction Tuning for Mathematical Reasoning [70.89605383298331]
Large language models (LLMs) have demonstrated remarkable capabilities in problem-solving. However, their proficiency in solving mathematical problems remains inadequate. We propose MathScale, a simple and scalable method to create high-quality mathematical reasoning data.
arXiv Detail & Related papers (2024-03-05T11:42:59Z)
Positional Description Matters for Transformers Arithmetic [58.4739272381373]
Transformers often falter on arithmetic tasks despite their vast capabilities. We propose several ways to fix the issue, either by modifying the positional encoding directly, or by modifying the representation of the arithmetic task to leverage standard positional encoding differently.
arXiv Detail & Related papers (2023-11-22T00:31:01Z)
Solving the multiplication problem of a large language model system using a graph-based method [20.43440908151311]
ChatGPT possesses excellent natural language processing capabilities but is inadequate for solving arithmetic problems. We developed a graph-based multiplication algorithm that emulated human-like numerical operations. Our proposed algorithm attained 100% accuracy for 1,000,000 large number multiplication tasks.
arXiv Detail & Related papers (2023-10-18T08:02:00Z)
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning [52.97768001837269]
We present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations. We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions. This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems.
arXiv Detail & Related papers (2023-10-05T17:52:09Z)
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning [60.208045804204076]
We introduce MAmmoTH, a series of open-source large language models (LLMs) specifically tailored for general math problem-solving. The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset.
arXiv Detail & Related papers (2023-09-11T17:47:22Z)
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct [128.89645483139236]
We present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Our model even surpasses ChatGPT-3.5, Claude Instant-1, PaLM-2 and Minerva on GSM8k, simultaneously surpasses Text-davinci, PaLM-1 and GPT-3 on MATH.
arXiv Detail & Related papers (2023-08-18T14:23:21Z)
How well do Large Language Models perform in Arithmetic tasks? [25.638682874990206]
Large language models have emerged abilities including chain-of-thought to answer math word problems step by step. To the best of our knowledge, there is no work to focus on evaluating the arithmetic ability of large language models. In this work, we propose an arithmetic dataset MATH 401 to test the latest large language models.
arXiv Detail & Related papers (2023-03-16T09:28:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.