Language Models Encode Numbers Using Digit Representations in Base 10
- URL: http://arxiv.org/abs/2410.11781v2
- Date: Sun, 02 Feb 2025 20:27:54 GMT
- Title: Language Models Encode Numbers Using Digit Representations in Base 10
- Authors: Amit Arnold Levy, Mor Geva,
- Abstract summary: We show that large language models (LLMs) make errors when handling simple numerical problems.
LLMs internally represent numbers with individual circular representations per-digit in base 10.
This digit-wise representation sheds light on the error patterns of models on tasks involving numerical reasoning.
- Score: 12.913172023910203
- License:
- Abstract: Large language models (LLMs) frequently make errors when handling even simple numerical problems, such as comparing two small numbers. A natural hypothesis is that these errors stem from how LLMs represent numbers, and specifically, whether their representations of numbers capture their numeric values. We tackle this question from the observation that LLM errors on numerical tasks are often distributed across the digits of the answer rather than normally around its numeric value. Through a series of probing experiments and causal interventions, we show that LLMs internally represent numbers with individual circular representations per-digit in base 10. This digit-wise representation, as opposed to a value representation, sheds light on the error patterns of models on tasks involving numerical reasoning and could serve as a basis for future studies on analyzing numerical mechanisms in LLMs.
Related papers
- FoNE: Precise Single-Token Number Embeddings via Fourier Features [51.17846016593835]
We propose a novel method that maps numbers into the embedding space with their Fourier features.
FoNE encodes each number as a single token with only two embedding dimensions per digit, effectively capturing numerical values without fragmentation.
On 6-digit decimal addition, FoNE requires 64$times$ less data to achieve 99% accuracy than subword and digit-wise embeddings.
FoNE is the only method that yields 100% accuracy on over 100,000 test examples for addition, subtraction, and multiplication.
arXiv Detail & Related papers (2025-02-13T19:54:59Z) - Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges [0.0]
We introduce GSM-Ranges, a dataset generator that systematically perturbs numerical values in math problems to assess model robustness across varying numerical scales.
We also propose a novel grading methodology that distinguishes between logical and non-logical errors, offering a more precise evaluation of reasoning processes beyond computational accuracy.
arXiv Detail & Related papers (2025-02-12T09:53:10Z) - Why Do Large Language Models (LLMs) Struggle to Count Letters? [2.8367942280334493]
Large Language Models (LLMs) have achieved unprecedented performance on many complex tasks.
They struggle with other simple tasks, such as counting the occurrences of letters in a word.
arXiv Detail & Related papers (2024-12-19T22:47:08Z) - How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs [69.55103380185612]
We identify numerical precision as a key factor that influences Transformer-based Large Language Models' effectiveness in mathematical tasks.
Our results show that Transformers operating with low numerical precision fail to address arithmetic tasks, such as iterated addition and integer multiplication.
In contrast, Transformers with standard numerical precision can efficiently handle these tasks with significantly smaller model sizes.
arXiv Detail & Related papers (2024-10-17T17:59:35Z) - NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning [27.584258258635945]
Language models struggle with handling numerical data and performing arithmetic operations.
We propose a simple adjustment to how numbers are represented by including the count of digits before each number.
By requiring the model to consider the number of digits first, it enhances the reasoning process before generating the actual number.
arXiv Detail & Related papers (2024-03-30T19:46:59Z) - GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers [68.77382332826167]
Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks.
One essential and frequently occurring evidence is that when the math questions are slightly changed, LLMs can behave incorrectly.
This motivates us to evaluate the robustness of LLMs' math reasoning capability by testing a wide range of question variations.
arXiv Detail & Related papers (2024-02-29T15:26:14Z) - Language Models Encode the Value of Numbers Linearly [28.88044346200171]
We study how language models encode the value of numbers, a basic element in math.
Experimental results support the existence of encoded number values in large language models.
Our research provides evidence that LLMs encode the value of numbers linearly.
arXiv Detail & Related papers (2024-01-08T08:54:22Z) - Positional Description Matters for Transformers Arithmetic [58.4739272381373]
Transformers often falter on arithmetic tasks despite their vast capabilities.
We propose several ways to fix the issue, either by modifying the positional encoding directly, or by modifying the representation of the arithmetic task to leverage standard positional encoding differently.
arXiv Detail & Related papers (2023-11-22T00:31:01Z) - Human Behavioral Benchmarking: Numeric Magnitude Comparison Effects in
Large Language Models [4.412336603162406]
Large Language Models (LLMs) do not differentially represent numbers, which are pervasive in text.
In this work, we investigate how well popular LLMs capture the magnitudes of numbers from a behavioral lens.
arXiv Detail & Related papers (2023-05-18T07:50:44Z) - PAL: Program-aided Language Models [112.94785609781503]
We present Program-Aided Language models (PaL) to understand natural language problems.
PaL offloads the solution step to a programmatic runtime such as a Python interpreter.
We set new state-of-the-art results in all 12 benchmarks.
arXiv Detail & Related papers (2022-11-18T18:56:13Z) - NumGPT: Improving Numeracy Ability of Generative Pre-trained Models [59.931394234642816]
We propose NumGPT, a generative pre-trained model that explicitly models the numerical properties of numbers in texts.
Specifically, it leverages a prototype-based numeral embedding to encode the mantissa of the number and an individual embedding to encode the exponent of the number.
A numeral-aware loss function is designed to integrate numerals into the pre-training objective of NumGPT.
arXiv Detail & Related papers (2021-09-07T15:06:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.