Related papers: Deep sequence models tend to memorize geometrically; it is unclear why

Deep sequence models tend to memorize geometrically; it is unclear why

URL: http://arxiv.org/abs/2510.26745v1
Date: Thu, 30 Oct 2025 17:40:22 GMT
Title: Deep sequence models tend to memorize geometrically; it is unclear why
Authors: Shahriar Noroozizadeh, Vaishnavh Nagarajan, Elan Rosenfeld, Sanjiv Kumar,
Abstract summary: We argue that the rise of such a geometry, despite optimizing over mere local associations, cannot be straightforwardly attributed to typical architectural or optimizational pressures.<n>We demonstrate how the geometry stems from a spectral bias that -- in contrast to prevailing theories -- indeed arises naturally despite the lack of various pressures.
Score: 42.53849315139079
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In sequence modeling, the parametric memory of atomic facts has been predominantly abstracted as a brute-force lookup of co-occurrences between entities. We contrast this associative view against a geometric view of how memory is stored. We begin by isolating a clean and analyzable instance of Transformer reasoning that is incompatible with memory as strictly a storage of the local co-occurrences specified during training. Instead, the model must have somehow synthesized its own geometry of atomic facts, encoding global relationships between all entities, including non-co-occurring ones. This in turn has simplified a hard reasoning task involving an $\ell$-fold composition into an easy-to-learn 1-step geometric task. From this phenomenon, we extract fundamental aspects of neural embedding geometries that are hard to explain. We argue that the rise of such a geometry, despite optimizing over mere local associations, cannot be straightforwardly attributed to typical architectural or optimizational pressures. Counterintuitively, an elegant geometry is learned even when it is not more succinct than a brute-force lookup of associations. Then, by analyzing a connection to Node2Vec, we demonstrate how the geometry stems from a spectral bias that -- in contrast to prevailing theories -- indeed arises naturally despite the lack of various pressures. This analysis also points to practitioners a visible headroom to make Transformer memory more strongly geometric. We hope the geometric view of parametric memory encourages revisiting the default intuitions that guide researchers in areas like knowledge acquisition, capacity, discovery and unlearning.

Related papers

Visual Diffusion Models are Geometric Solvers [54.31602846693932]
We show that visual diffusion models can serve as effective geometric solvers by working in pixel space.<n>We first demonstrate this on the Inscribed Square Problem, a long-standing problem in geometry.<n>We extend the approach to two other well-known hard geometric problems: the Steiner Tree Problem and the Simple Polygon Problem.
arXiv Detail & Related papers (2025-10-24T17:57:31Z)
Deep Learning as the Disciplined Construction of Tame Objects [0.9786690381850356]
One can see deep-learning as compositions of functions within the so-called tame geometry.<n>In this note, we give an overview of tame interface theory (also as o-minimality) and deep learning theory.
arXiv Detail & Related papers (2025-09-22T17:00:40Z)
Fully Geometric Multi-Hop Reasoning on Knowledge Graphs with Transitive Relations [50.05281461410368]
We introduce GeometrE, a geometric embedding method for multi-hop reasoning.<n>It does not require learning the logical operations and enables full geometric interpretability.<n>Our experiments show that GeometrE outperforms current state-of-the-art methods on standard benchmark datasets.
arXiv Detail & Related papers (2025-05-18T11:17:50Z)
From Dionysius Emerges Apollo -- Learning Patterns and Abstractions from Perceptual Sequences [1.3597551064547502]
A sensory stream, simplified, is a one-dimensional sequence.<n>In learning such sequences, we naturally segment them into parts -- a process known as chunking.<n>I developed models that learn chunks and parse sequences chunk by chunk.
arXiv Detail & Related papers (2025-03-14T00:37:28Z)
Unraveling the geometry of visual relational reasoning [11.82509693248749]
Humans readily generalize abstract relations, such as recognizing "constant" in shape or color, whereas neural networks struggle, limiting their flexible reasoning.<n>We introduce SimplifiedRPM, a novel benchmark for systematically evaluating abstract relational reasoning.<n>We also conduct human experiments to quantify relational difficulty, enabling direct model-human comparisons.<n>Our results establish a geometric foundation for relational reasoning, paving the way for more human-like visual reasoning in AI.
arXiv Detail & Related papers (2025-02-24T18:07:54Z)
Slow Perception: Let's Perceive Geometric Figures Step-by-step [53.69067976062474]
We believe accurate copying (strong perception) is the first step to visual o1.<n>We introduce the concept of "slow perception" (SP), which guides the model to gradually perceive basic point-line combinations.
arXiv Detail & Related papers (2024-12-30T00:40:35Z)
Adaptive Surface Normal Constraint for Geometric Estimation from Monocular Images [56.86175251327466]
We introduce a novel approach to learn geometries such as depth and surface normal from images while incorporating geometric context. Our approach extracts geometric context that encodes the geometric variations present in the input image and correlates depth estimation with geometric constraints. Our method unifies depth and surface normal estimations within a cohesive framework, which enables the generation of high-quality 3D geometry from images.
arXiv Detail & Related papers (2024-02-08T17:57:59Z)
Exploring Data Geometry for Continual Learning [64.4358878435983]
We study continual learning from a novel perspective by exploring data geometry for the non-stationary stream of data. Our method dynamically expands the geometry of the underlying space to match growing geometric structures induced by new data. Experiments show that our method achieves better performance than baseline methods designed in Euclidean space.
arXiv Detail & Related papers (2023-04-08T06:35:25Z)
Geometric Algebra Attention Networks for Small Point Clouds [0.0]
Problems in the physical sciences deal with relatively small sets of points in two- or three-dimensional space. We present rotation- and permutation-equivariant architectures for deep learning on these small point clouds. We demonstrate the usefulness of these architectures by training models to solve sample problems relevant to physics, chemistry, and biology.
arXiv Detail & Related papers (2021-10-05T22:52:12Z)
On the geometry of generalization and memorization in deep neural networks [15.250162344382051]
We study the structure of when and where memorization occurs in a deep network. All layers preferentially learn from examples which share features, and link this behavior to generalization performance. We find that memorization predominately occurs in the deeper layers, due to decreasing object' radius and dimension.
arXiv Detail & Related papers (2021-05-30T19:07:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.