The Geometry of Numerical Reasoning: Language Models Compare Numeric Properties in Linear Subspaces
- URL: http://arxiv.org/abs/2410.13194v2
- Date: Sat, 08 Feb 2025 08:31:33 GMT
- Title: The Geometry of Numerical Reasoning: Language Models Compare Numeric Properties in Linear Subspaces
- Authors: Ahmed Oumar El-Shangiti, Tatsuya Hiraoka, Hilal AlQuabeh, Benjamin Heinzerling, Kentaro Inui,
- Abstract summary: This paper investigates whether large language models (LLMs) utilize numerical attributes encoded in a low-dimensional subspace of the embedding space when answering questions involving numeric comparisons.
We first identified, using partial least squares regression, these subspaces, which effectively encode the numerical attributes associated with the entities in comparison prompts.
- Score: 22.31258265337828
- License:
- Abstract: This paper investigates whether large language models (LLMs) utilize numerical attributes encoded in a low-dimensional subspace of the embedding space when answering questions involving numeric comparisons, e.g., Was Cristiano born before Messi? We first identified, using partial least squares regression, these subspaces, which effectively encode the numerical attributes associated with the entities in comparison prompts. Further, we demonstrate causality, by intervening in these subspaces to manipulate hidden states, thereby altering the LLM's comparison outcomes. Experiments conducted on three different LLMs showed that our results hold across different numerical attributes, indicating that LLMs utilize the linearly encoded information for numerical reasoning.
Related papers
- Idiosyncrasies in Large Language Models [54.26923012617675]
We unveil and study idiosyncrasies in Large Language Models (LLMs)
We find that fine-tuning existing text embedding models on LLM-generated texts yields excellent classification accuracy.
We leverage LLM as judges to generate detailed, open-ended descriptions of each model's idiosyncrasies.
arXiv Detail & Related papers (2025-02-17T18:59:02Z) - Demystifying Singular Defects in Large Language Models [61.98878352956125]
In large language models (LLMs), the underlying causes of high-norm tokens remain largely unexplored.
We provide both theoretical insights and empirical validation across a range of recent models.
We showcase two practical applications of these findings: the improvement of quantization schemes and the design of LLM signatures.
arXiv Detail & Related papers (2025-02-10T20:09:16Z) - A Comparative Study of Learning Paradigms in Large Language Models via Intrinsic Dimension [16.671316494925346]
This study investigates the effects of supervised fine-tuning and in-context learning on the hidden representations of Large Language Models (LLMs)
We first explore how the ID of LLM representations evolves during SFT and how it varies due to the number of demonstrations in ICL.
We then compare the IDs induced by SFT and ICL and find that ICL consistently induces a higher ID compared to SFT.
arXiv Detail & Related papers (2024-12-09T06:37:35Z) - Language Models Encode Numbers Using Digit Representations in Base 10 [12.913172023910203]
We show that large language models (LLMs) make errors when handling simple numerical problems.
LLMs internally represent numbers with individual circular representations per-digit in base 10.
This digit-wise representation sheds light on the error patterns of models on tasks involving numerical reasoning.
arXiv Detail & Related papers (2024-10-15T17:00:15Z) - Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models [14.594698598522797]
Demonstrating feature universality allows discoveries about latent representations to generalize across several models.
We employ a method known as dictionary learning to transform LLM activations into more interpretable spaces spanned by neurons corresponding to individual features.
Our experiments reveal significant similarities in SAE feature spaces across various LLMs, providing new evidence for feature universality.
arXiv Detail & Related papers (2024-10-09T15:18:57Z) - Hyperbolic Fine-tuning for Large Language Models [56.54715487997674]
This study investigates the non-Euclidean characteristics of large language models (LLMs)
We show that token embeddings exhibit a high degree of hyperbolicity, indicating a latent tree-like structure in the embedding space.
We introduce a new method called hyperbolic low-rank efficient fine-tuning, HypLoRA, that performs low-rank adaptation directly on the hyperbolic manifold.
arXiv Detail & Related papers (2024-10-05T02:58:25Z) - Characterizing Truthfulness in Large Language Model Generations with
Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs)
We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z) - Language Models Encode the Value of Numbers Linearly [28.88044346200171]
We study how language models encode the value of numbers, a basic element in math.
Experimental results support the existence of encoded number values in large language models.
Our research provides evidence that LLMs encode the value of numbers linearly.
arXiv Detail & Related papers (2024-01-08T08:54:22Z) - Why do Nearest Neighbor Language Models Work? [93.71050438413121]
Language models (LMs) compute the probability of a text by sequentially computing a representation of an already-seen context.
Retrieval-augmented LMs have shown to improve over standard neural LMs, by accessing information retrieved from a large datastore.
arXiv Detail & Related papers (2023-01-07T11:12:36Z) - Log-Euclidean Signatures for Intrinsic Distances Between Unaligned
Datasets [47.20862716252927]
We use manifold learning to compare the intrinsic geometric structures of different datasets.
We define a new theoretically-motivated distance based on a lower bound of the log-Euclidean metric.
arXiv Detail & Related papers (2022-02-03T16:37:23Z) - The Low-Dimensional Linear Geometry of Contextualized Word
Representations [27.50785941238007]
We study the linear geometry of contextualized word representations in ELMO and BERT.
We show that a variety of linguistic features are encoded in low-dimensional subspaces.
arXiv Detail & Related papers (2021-05-15T00:58:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.