Number Representations in LLMs: A Computational Parallel to Human Perception
- URL: http://arxiv.org/abs/2502.16147v1
- Date: Sat, 22 Feb 2025 08:44:29 GMT
- Title: Number Representations in LLMs: A Computational Parallel to Human Perception
- Authors: H. V. AlquBoj, Hilal AlQuabeh, Velibor Bojkovic, Tatsuya Hiraoka, Ahmed Oumar El-Shangiti, Munachiso Nwadike, Kentaro Inui,
- Abstract summary: We investigate whether large language models (LLMs) exhibit a similar logarithmic-like structure in their internal numerical representations.<n>Our findings reveal that the model's numerical representations exhibit sublinear spacing, with distances between values aligning with a logarithmic scale.
- Score: 17.769013342964794
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans are believed to perceive numbers on a logarithmic mental number line, where smaller values are represented with greater resolution than larger ones. This cognitive bias, supported by neuroscience and behavioral studies, suggests that numerical magnitudes are processed in a sublinear fashion rather than on a uniform linear scale. Inspired by this hypothesis, we investigate whether large language models (LLMs) exhibit a similar logarithmic-like structure in their internal numerical representations. By analyzing how numerical values are encoded across different layers of LLMs, we apply dimensionality reduction techniques such as PCA and PLS followed by geometric regression to uncover latent structures in the learned embeddings. Our findings reveal that the model's numerical representations exhibit sublinear spacing, with distances between values aligning with a logarithmic scale. This suggests that LLMs, much like humans, may encode numbers in a compressed, non-uniform manner.
Related papers
- I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [79.01538178959726]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.
We introduce a novel generative model that generates tokens on the basis of human interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z) - The Geometry of Numerical Reasoning: Language Models Compare Numeric Properties in Linear Subspaces [22.31258265337828]
This paper investigates whether large language models (LLMs) utilize numerical attributes encoded in a low-dimensional subspace of the embedding space when answering questions involving numeric comparisons.<n>We first identified, using partial least squares regression, these subspaces, which effectively encode the numerical attributes associated with the entities in comparison prompts.
arXiv Detail & Related papers (2024-10-17T03:44:11Z) - Language Models Encode Numbers Using Digit Representations in Base 10 [12.913172023910203]
We show that large language models (LLMs) make errors when handling simple numerical problems.
LLMs internally represent numbers with individual circular representations per-digit in base 10.
This digit-wise representation sheds light on the error patterns of models on tasks involving numerical reasoning.
arXiv Detail & Related papers (2024-10-15T17:00:15Z) - Language Models Encode the Value of Numbers Linearly [28.88044346200171]
We study how language models encode the value of numbers, a basic element in math.
Experimental results support the existence of encoded number values in large language models.
Our research provides evidence that LLMs encode the value of numbers linearly.
arXiv Detail & Related papers (2024-01-08T08:54:22Z) - Human Behavioral Benchmarking: Numeric Magnitude Comparison Effects in
Large Language Models [4.412336603162406]
Large Language Models (LLMs) do not differentially represent numbers, which are pervasive in text.
In this work, we investigate how well popular LLMs capture the magnitudes of numbers from a behavioral lens.
arXiv Detail & Related papers (2023-05-18T07:50:44Z) - Learning Discretized Neural Networks under Ricci Flow [48.47315844022283]
We study Discretized Neural Networks (DNNs) composed of low-precision weights and activations.<n>DNNs suffer from either infinite or zero gradients due to the non-differentiable discrete function during training.
arXiv Detail & Related papers (2023-02-07T10:51:53Z) - An Information-Theoretic Analysis of Compute-Optimal Neural Scaling Laws [24.356906682593532]
We study the compute-optimal trade-off between model and training data set sizes for large neural networks.
Our result suggests a linear relation similar to that supported by the empirical analysis of chinchilla.
arXiv Detail & Related papers (2022-12-02T18:46:41Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Learning Optical Flow from a Few Matches [67.83633948984954]
We show that the dense correlation volume representation is redundant and accurate flow estimation can be achieved with only a fraction of elements in it.
Experiments show that our method can reduce computational cost and memory use significantly, while maintaining high accuracy.
arXiv Detail & Related papers (2021-04-05T21:44:00Z) - FLAMBE: Structural Complexity and Representation Learning of Low Rank
MDPs [53.710405006523274]
This work focuses on the representation learning question: how can we learn such features?
Under the assumption that the underlying (unknown) dynamics correspond to a low rank transition matrix, we show how the representation learning question is related to a particular non-linear matrix decomposition problem.
We develop FLAMBE, which engages in exploration and representation learning for provably efficient RL in low rank transition models.
arXiv Detail & Related papers (2020-06-18T19:11:18Z) - Interpolation and Learning with Scale Dependent Kernels [91.41836461193488]
We study the learning properties of nonparametric ridge-less least squares.
We consider the common case of estimators defined by scale dependent kernels.
arXiv Detail & Related papers (2020-06-17T16:43:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.