The Geometry of Numerical Reasoning: Language Models Compare Numeric Properties in Linear Subspaces
- URL: http://arxiv.org/abs/2410.13194v1
- Date: Thu, 17 Oct 2024 03:44:11 GMT
- Title: The Geometry of Numerical Reasoning: Language Models Compare Numeric Properties in Linear Subspaces
- Authors: Ahmed Oumar El-Shangiti, Tatsuya Hiraoka, Hilal AlQuabeh, Benjamin Heinzerling, Kentaro Inui,
- Abstract summary: This paper investigates whether large language models (LLMs) utilize numerical attributes encoded in a low-dimensional subspace of the embedding space when answering logical comparison questions.
We first identified these subspaces using partial least squares regression, which effectively encodes the numerical attributes associated with the entities in comparison prompts.
- Score: 22.31258265337828
- License:
- Abstract: This paper investigates whether large language models (LLMs) utilize numerical attributes encoded in a low-dimensional subspace of the embedding space when answering logical comparison questions (e.g., Was Cristiano born before Messi?). We first identified these subspaces using partial least squares regression, which effectively encodes the numerical attributes associated with the entities in comparison prompts. Further, we demonstrate causality by intervening in these subspaces to manipulate hidden states, thereby altering the LLM's comparison outcomes. Experimental results show that our findings hold for different numerical attributes, indicating that LLMs utilize the linearly encoded information for numerical reasoning.
Related papers
- Hyperbolic Fine-tuning for Large Language Models [56.54715487997674]
This study investigates the non-Euclidean characteristics of large language models (LLMs)
We show that token embeddings exhibit a high degree of hyperbolicity, indicating a latent tree-like structure in the embedding space.
We introduce a new method called hyperbolic low-rank efficient fine-tuning, HypLoRA, that performs low-rank adaptation directly on the hyperbolic manifold.
arXiv Detail & Related papers (2024-10-05T02:58:25Z) - Characterizing Truthfulness in Large Language Model Generations with
Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs)
We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z) - Language Models Know the Value of Numbers [28.88044346200171]
We study whether language models know the value of numbers, a basic element in math.
Experimental results support the existence of encoded number values in large language models.
Our research provides evidence that LLMs know the value of numbers, thus offering insights for better exploring, designing, and utilizing numeric information in LLMs.
arXiv Detail & Related papers (2024-01-08T08:54:22Z) - More than Correlation: Do Large Language Models Learn Causal
Representations of Space? [6.293100288400849]
This study focused on uncovering the causality of the spatial representations in large language models.
Experiments showed that the spatial representations influenced the model's performance on next word prediction and a downstream task that relies on geospatial information.
arXiv Detail & Related papers (2023-12-26T01:27:29Z) - Representation Of Lexical Stylistic Features In Language Models'
Embedding Space [28.60690854046176]
We show that it is possible to derive a vector representation for each of these stylistic notions from only a small number of seed pairs.
We conduct experiments on five datasets and find that static embeddings encode these features more accurately at the level of words and phrases.
The lower performance of contextualized representations at the word level is partially attributable to the anisotropy of their vector space.
arXiv Detail & Related papers (2023-05-29T23:44:26Z) - Interpretability at Scale: Identifying Causal Mechanisms in Alpaca [62.65877150123775]
We use Boundless DAS to efficiently search for interpretable causal structure in large language models while they follow instructions.
Our findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models.
arXiv Detail & Related papers (2023-05-15T17:15:40Z) - Why do Nearest Neighbor Language Models Work? [93.71050438413121]
Language models (LMs) compute the probability of a text by sequentially computing a representation of an already-seen context.
Retrieval-augmented LMs have shown to improve over standard neural LMs, by accessing information retrieved from a large datastore.
arXiv Detail & Related papers (2023-01-07T11:12:36Z) - Multilevel orthogonal Bochner function subspaces with applications to
robust machine learning [1.533771872970755]
We consider the data as instances of a random field within a relevant Bochner space.
Our key observation is that the classes can predominantly reside in two distinct subspaces.
arXiv Detail & Related papers (2021-10-04T22:01:01Z) - The Low-Dimensional Linear Geometry of Contextualized Word
Representations [27.50785941238007]
We study the linear geometry of contextualized word representations in ELMO and BERT.
We show that a variety of linguistic features are encoded in low-dimensional subspaces.
arXiv Detail & Related papers (2021-05-15T00:58:08Z) - Null It Out: Guarding Protected Attributes by Iterative Nullspace
Projection [51.041763676948705]
Iterative Null-space Projection (INLP) is a novel method for removing information from neural representations.
We show that our method is able to mitigate bias in word embeddings, as well as to increase fairness in a setting of multi-class classification.
arXiv Detail & Related papers (2020-04-16T14:02:50Z) - Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem.
The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces.
To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.