Depth-Wise Emergence of Prediction-Centric Geometry in Large Language Models
- URL: http://arxiv.org/abs/2602.04931v1
- Date: Wed, 04 Feb 2026 11:00:16 GMT
- Title: Depth-Wise Emergence of Prediction-Centric Geometry in Large Language Models
- Authors: Shahar Haim, Daniel C McNamee,
- Abstract summary: We show that decoder-only large language models exhibit a transition from context-processing to prediction-forming phases of computation.<n>Using a unified framework combining geometric analysis with mechanistic intervention, we demonstrate that late-layer representations implement a structured geometric code that enables selective causal control over token prediction.
- Score: 1.0742675209112622
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We show that decoder-only large language models exhibit a depth-wise transition from context-processing to prediction-forming phases of computation accompanied by a reorganization of representational geometry. Using a unified framework combining geometric analysis with mechanistic intervention, we demonstrate that late-layer representations implement a structured geometric code that enables selective causal control over token prediction. Specifically, angular organization of the representation geometry parametrizes prediction distributional similarity, while representation norms encode context-specific information that does not determine prediction. Together, these results provide a mechanistic-geometric account of the dynamics of transforming context into predictions in LLMs.
Related papers
- TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning [104.66714520975837]
We introduce a geometry-grounded benchmark designed to evaluate compositional spatial reasoning through the lens of the classic Tangram game.<n>We propose the Tangram Construction Expression (TCE), a symbolic geometric framework that grounds tangram assemblies in exact, machine-verifiable coordinate specifications.<n>We conduct extensive evaluation experiments on advanced open-source and proprietary models, revealing an interesting insight: MLLMs tend to prioritize matching the target silhouette while neglecting geometric constraints.
arXiv Detail & Related papers (2026-01-23T07:35:05Z) - Self-Attention as Distributional Projection: A Unified Interpretation of Transformer Architecture [0.0]
We show that self-attention emerges from projecting corpus-level co-occurrence statistics into sequence context.<n>Our analysis demonstrates that the Transformer architecture's particular algebraic form follows from these projection principles.
arXiv Detail & Related papers (2025-11-16T02:25:04Z) - CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization [51.716834831684004]
We study the problem of conformal prediction (CP) under geometric data shifts.<n>We propose integrating geometric information--such as geometric pose--into the conformal procedure to reinstate its guarantees.
arXiv Detail & Related papers (2025-06-19T10:12:02Z) - Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings [67.5600169375126]
We study the task of panoptic symbol spotting in computer-aided design (CAD) drawings composed of vector graphical primitives.<n>Existing methods typically rely on imageization, graph construction, or point-based representation.<n>We propose VecFormer, a novel method that addresses these challenges through line-based representation of primitives.
arXiv Detail & Related papers (2025-05-29T12:33:11Z) - Sparsification and Reconstruction from the Perspective of Representation Geometry [10.834177456685538]
Sparse Autoencoders (SAEs) have emerged as a predominant tool in mechanistic interpretability.<n>This study explains the principles of sparsity from the perspective of representational geometry.<n>Specifically emphasizes the necessity of understanding representations and incorporating representational constraints.
arXiv Detail & Related papers (2025-05-28T15:54:33Z) - Geometry-Editable and Appearance-Preserving Object Compositon [67.98806888489385]
General object composition (GOC) aims to seamlessly integrate a target object into a background scene with desired geometric properties.<n>Recent approaches derive semantic embeddings and integrate them into advanced diffusion models to enable geometry-editable generation.<n>We introduce a Disentangled Geometry-editable and Appearance-preserving Diffusion model that first leverages semantic embeddings to implicitly capture desired geometric transformations.
arXiv Detail & Related papers (2025-05-27T09:05:28Z) - Geometry of Semantics in Next-Token Prediction: How Optimization Implicitly Organizes Linguistic Representations [34.88156871518115]
Next-token prediction (NTP) optimization leads language models to extract and organize semantic structure from text.<n>We demonstrate that concepts corresponding to larger singular values are learned earlier during training, yielding a natural semantic hierarchy.<n>This insight motivates orthant-based clustering, a method that combines concept signs to identify interpretable semantic categories.
arXiv Detail & Related papers (2025-05-13T08:46:04Z) - Constrained belief updates explain geometric structures in transformer representations [1.1666234644810893]
We integrate the model-agnostic theory of optimal prediction with mechanistic interpretability to analyze transformers trained on a tractable family of hidden Markov models.<n>Our analysis focuses on single-layer transformers, revealing how the first attention layer implements constrained updates.<n>We show how both the algorithmic behavior and the underlying geometry of these representations can be theoretically predicted in detail.
arXiv Detail & Related papers (2025-02-04T03:03:54Z) - Relative Representations: Topological and Geometric Perspectives [50.85040046976025]
Relative representations are an established approach to zero-shot model stitching.<n>We introduce a normalization procedure in the relative transformation, resulting in invariance to non-isotropic rescalings and permutations.<n>Second, we propose to deploy topological densification when fine-tuning relative representations, a topological regularization loss encouraging clustering within classes.
arXiv Detail & Related papers (2024-09-17T08:09:22Z) - Sigma Flows for Image and Data Labeling and Learning Structured Prediction [5.875121114945721]
This paper introduces the sigma flow model for the prediction of structured labelings of data observed on Riemannian manifold.<n>The approach combines the Laplace-Beltrami framework for image denoising and enhancement, introduced by Sochen, Kimmel and Malladi about 25 years ago, and the assignment flow approach introduced and studied by the authors.
arXiv Detail & Related papers (2024-08-28T17:04:56Z) - Corner-to-Center Long-range Context Model for Efficient Learned Image
Compression [70.0411436929495]
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations.
We propose the textbfCorner-to-Center transformer-based Context Model (C$3$M) designed to enhance context and latent predictions.
In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder.
arXiv Detail & Related papers (2023-11-29T21:40:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.