Modular Arithmetic: Language Models Solve Math Digit by Digit
- URL: http://arxiv.org/abs/2508.02513v1
- Date: Mon, 04 Aug 2025 15:18:41 GMT
- Title: Modular Arithmetic: Language Models Solve Math Digit by Digit
- Authors: Tanja Baeumel, Daniil Gurgurov, Yusser al Ghussin, Josef van Genabith, Simon Ostermann,
- Abstract summary: We present evidence for the existence of digit-position-specific circuits that Large Language Models (LLMs) use to perform arithmetic tasks.<n>Using Importance Feature and Causal Interventions, we identify and validate the digit-position-specific circuits.<n>Our interventions selectively alter the model's prediction at targeted digit positions, demonstrating the causal role of digit-position circuits in solving arithmetic tasks.
- Score: 9.827634698754014
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While recent work has begun to uncover the internal strategies that Large Language Models (LLMs) employ for simple arithmetic tasks, a unified understanding of their underlying mechanisms is still lacking. We extend recent findings showing that LLMs represent numbers in a digit-wise manner and present evidence for the existence of digit-position-specific circuits that LLMs use to perform simple arithmetic tasks, i.e. modular subgroups of MLP neurons that operate independently on different digit positions (units, tens, hundreds). Notably, such circuits exist independently of model size and of tokenization strategy, i.e. both for models that encode longer numbers digit-by-digit and as one token. Using Feature Importance and Causal Interventions, we identify and validate the digit-position-specific circuits, revealing a compositional and interpretable structure underlying the solving of arithmetic problems in LLMs. Our interventions selectively alter the model's prediction at targeted digit positions, demonstrating the causal role of digit-position circuits in solving arithmetic tasks.
Related papers
- When can isotropy help adapt LLMs' next word prediction to numerical domains? [53.98633183204453]
It is shown that the isotropic property of LLM embeddings in contextual embedding space preserves the underlying structure of representations.<n> Experiments show that different characteristics of numerical data and model architectures have different impacts on isotropy.
arXiv Detail & Related papers (2025-05-22T05:10:34Z) - Unraveling Arithmetic in Large Language Models: The Role of Algebraic Structures [2.8311048083168657]
Large language models (LLMs) have demonstrated remarkable mathematical capabilities, largely driven by chain-of-thought (CoT) prompting.<n>We propose that LLMs learn arithmetic by capturing algebraic structures, such as commutativity and identity properties.
arXiv Detail & Related papers (2024-11-25T10:23:11Z) - Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics [43.86518549293703]
We show that large language models (LLMs) perform arithmetic using neither robust algorithms nor memorization.<n>Overall, our experimental results show that LLMs perform arithmetic using neither robust algorithms nor memorization.
arXiv Detail & Related papers (2024-10-28T17:59:06Z) - Language Models are Symbolic Learners in Arithmetic [8.34588487873447]
Large Language Models (LLMs) are thought to struggle with arithmetic learning due to inherent differences between language modeling and numerical computation.
We first investigate whether LLMs leverage partial products during arithmetic learning.
We find that although LLMs can identify some partial products after learning, they fail to leverage them for arithmetic tasks, conversely.
arXiv Detail & Related papers (2024-10-21T01:57:16Z) - How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs [69.55103380185612]
We identify numerical precision as a key factor that influences Transformer-based large language models' arithmetic performances.<n>Our results show that Transformers operating with low numerical precision fail to address arithmetic tasks, such as iterated addition and integer multiplication.<n>In contrast, Transformers with standard numerical precision can efficiently handle these tasks with significantly smaller model sizes.
arXiv Detail & Related papers (2024-10-17T17:59:35Z) - Language Models Encode Numbers Using Digit Representations in Base 10 [12.913172023910203]
We show that large language models (LLMs) make errors when handling simple numerical problems.<n>LLMs internally represent numbers with individual circular representations per-digit in base 10.<n>This digit-wise representation sheds light on the error patterns of models on tasks involving numerical reasoning.
arXiv Detail & Related papers (2024-10-15T17:00:15Z) - Interpreting and Improving Large Language Models in Arithmetic Calculation [72.19753146621429]
Large language models (LLMs) have demonstrated remarkable potential across numerous applications.
In this work, we delve into uncovering a specific mechanism by which LLMs execute calculations.
We investigate the potential benefits of selectively fine-tuning these essential heads/MLPs to boost the LLMs' computational performance.
arXiv Detail & Related papers (2024-09-03T07:01:46Z) - Reverse That Number! Decoding Order Matters in Arithmetic Learning [49.5504492920404]
Our work introduces a novel strategy that reevaluates the digit order by prioritizing output from the least significant digit.
Compared to the previous state-of-the-art (SOTA) method, our findings reveal an overall improvement of in accuracy while requiring only a third of the tokens typically used during training.
arXiv Detail & Related papers (2024-03-09T09:04:53Z) - In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL)
We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z) - Language Models Encode the Value of Numbers Linearly [28.88044346200171]
We study how language models encode the value of numbers, a basic element in math.
Experimental results support the existence of encoded number values in large language models.
Our research provides evidence that LLMs encode the value of numbers linearly.
arXiv Detail & Related papers (2024-01-08T08:54:22Z) - Language Models Implement Simple Word2Vec-style Vector Arithmetic [32.2976613483151]
A primary criticism towards language models (LMs) is their inscrutability.
This paper presents evidence that, despite their size and complexity, LMs sometimes exploit a simple vector arithmetic style mechanism to solve some relational tasks.
arXiv Detail & Related papers (2023-05-25T15:04:01Z) - A Mechanistic Interpretation of Arithmetic Reasoning in Language Models
using Causal Mediation Analysis [128.0532113800092]
We present a mechanistic interpretation of Transformer-based LMs on arithmetic questions.
This provides insights into how information related to arithmetic is processed by LMs.
arXiv Detail & Related papers (2023-05-24T11:43:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.