Related papers: How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs

How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs

URL: http://arxiv.org/abs/2410.13857v1
Date: Thu, 17 Oct 2024 17:59:35 GMT
Title: How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs
Authors: Guhao Feng, Kai Yang, Yuntian Gu, Xinyue Ai, Shengjie Luo, Jiacheng Sun, Di He, Zhenguo Li, Liwei Wang,
Abstract summary: We identify numerical precision as a key factor that influences Transformer-based Large Language Models' effectiveness in mathematical tasks. Our results show that Transformers operating with low numerical precision fail to address arithmetic tasks, such as iterated addition and integer multiplication. In contrast, Transformers with standard numerical precision can efficiently handle these tasks with significantly smaller model sizes.
Score: 69.55103380185612
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the remarkable success of Transformer-based Large Language Models (LLMs) across various domains, understanding and enhancing their mathematical capabilities remains a significant challenge. In this paper, we conduct a rigorous theoretical analysis of LLMs' mathematical abilities, with a specific focus on their arithmetic performances. We identify numerical precision as a key factor that influences their effectiveness in mathematical tasks. Our results show that Transformers operating with low numerical precision fail to address arithmetic tasks, such as iterated addition and integer multiplication, unless the model size grows super-polynomially with respect to the input length. In contrast, Transformers with standard numerical precision can efficiently handle these tasks with significantly smaller model sizes. We further support our theoretical findings through empirical experiments that explore the impact of varying numerical precision on arithmetic tasks, providing valuable insights for improving the mathematical reasoning capabilities of LLMs.

Related papers

When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers [64.1656365676171]
Task arithmetic refers to editing the pre-trained model by adding a weighted sum of task vectors. This paper theoretically prove the effectiveness of task addition in simultaneously learning a set of irrelevant or irrelevant tasks. We prove the proper selection for task arithmetic to achieve negation to out-of-domain tasks.
arXiv Detail & Related papers (2025-04-15T08:04:39Z)
Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges [0.0]
We introduce GSM-Ranges, a dataset generator that systematically perturbs numerical values in math problems to assess model robustness across varying numerical scales. We also propose a novel grading methodology that distinguishes between logical and non-logical errors, offering a more precise evaluation of reasoning processes beyond computational accuracy.
arXiv Detail & Related papers (2025-02-12T09:53:10Z)
Unraveling Arithmetic in Large Language Models: The Role of Algebraic Structures [3.181878085746691]
Large language models (LLMs) have demonstrated remarkable mathematical capabilities, largely driven by chain-of-thought (CoT) prompting. We propose that LLMs learn arithmetic by capturing algebraic structures, such as emphCommutativity and emphIdentity properties. Our findings indicate that leveraging algebraic structures can enhance the LLMs' arithmetic capabilities, offering insights into improving their arithmetic performance.
arXiv Detail & Related papers (2024-11-25T10:23:11Z)
Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines [7.695524275630717]
Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing and reasoning tasks. We propose a Composable Arithmetic Execution Framework (CAEF) that enables LLMs to learn to execute step-by-step computations by emulating Turing Machines. In our evaluation, CAEF achieves nearly 100% accuracy across seven common mathematical operations on the LLaMA 3.1-8B model.
arXiv Detail & Related papers (2024-10-10T13:23:49Z)
Interpreting and Improving Large Language Models in Arithmetic Calculation [72.19753146621429]
Large language models (LLMs) have demonstrated remarkable potential across numerous applications. In this work, we delve into uncovering a specific mechanism by which LLMs execute calculations. We investigate the potential benefits of selectively fine-tuning these essential heads/MLPs to boost the LLMs' computational performance.
arXiv Detail & Related papers (2024-09-03T07:01:46Z)
Performance Law of Large Language Models [58.32539851241063]
Performance law can be used to guide the choice of LLM architecture and the effective allocation of computational resources. Performance law can be used to guide the choice of LLM architecture and the effective allocation of computational resources without extensive experiments.
arXiv Detail & Related papers (2024-08-19T11:09:12Z)
Dissecting Multiplication in Transformers: Insights into LLMs [23.109124772063574]
We focus on a typical arithmetic task, integer multiplication, to explore and explain the imperfection of transformers in this domain. We provide comprehensive analysis of a vanilla transformer trained to perform n-digit integer multiplication. We propose improvements to enhance transformers performance on multiplication tasks.
arXiv Detail & Related papers (2024-07-22T04:07:26Z)
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers [68.77382332826167]
Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks. One essential and frequently occurring evidence is that when the math questions are slightly changed, LLMs can behave incorrectly. This motivates us to evaluate the robustness of LLMs' math reasoning capability by testing a wide range of question variations.
arXiv Detail & Related papers (2024-02-29T15:26:14Z)
Generating Mathematical Derivations with Large Language Models [2.363388546004777]
We leverage a symbolic engine to generate derivations of equations at scale. We investigate the capabilities of Large Language Models when deriving goal equations from premises.
arXiv Detail & Related papers (2023-07-19T14:13:02Z)
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models [96.9373147383119]
We show that weight disentanglement is the crucial factor that makes task arithmetic effective. We show that fine-tuning models in their tangent space by linearizing them amplifies weight disentanglement. This leads to substantial performance improvements across task arithmetic benchmarks and diverse models.
arXiv Detail & Related papers (2023-05-22T08:39:25Z)
Measuring Mathematical Problem Solving With the MATH Dataset [55.4376028963537]
We introduce MATH, a dataset of 12,500 challenging competition mathematics problems. Each problem has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations. We also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics.
arXiv Detail & Related papers (2021-03-05T18:59:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.