Related papers: Depth-Wise Emergence of Prediction-Centric Geometry in Large Language Models

Depth-Wise Emergence of Prediction-Centric Geometry in Large Language Models

URL: http://arxiv.org/abs/2602.04931v1
Date: Wed, 04 Feb 2026 11:00:16 GMT
Title: Depth-Wise Emergence of Prediction-Centric Geometry in Large Language Models
Authors: Shahar Haim, Daniel C McNamee,
Abstract summary: We show that decoder-only large language models exhibit a transition from context-processing to prediction-forming phases of computation.<n>Using a unified framework combining geometric analysis with mechanistic intervention, we demonstrate that late-layer representations implement a structured geometric code that enables selective causal control over token prediction.
Score: 1.0742675209112622
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We show that decoder-only large language models exhibit a depth-wise transition from context-processing to prediction-forming phases of computation accompanied by a reorganization of representational geometry. Using a unified framework combining geometric analysis with mechanistic intervention, we demonstrate that late-layer representations implement a structured geometric code that enables selective causal control over token prediction. Specifically, angular organization of the representation geometry parametrizes prediction distributional similarity, while representation norms encode context-specific information that does not determine prediction. Together, these results provide a mechanistic-geometric account of the dynamics of transforming context into predictions in LLMs.

Related papers

TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning [104.66714520975837]
We introduce a geometry-grounded benchmark designed to evaluate compositional spatial reasoning through the lens of the classic Tangram game.<n>We propose the Tangram Construction Expression (TCE), a symbolic geometric framework that grounds tangram assemblies in exact, machine-verifiable coordinate specifications.<n>We conduct extensive evaluation experiments on advanced open-source and proprietary models, revealing an interesting insight: MLLMs tend to prioritize matching the target silhouette while neglecting geometric constraints.
arXiv Detail & Related papers (2026-01-23T07:35:05Z)
Self-Attention as Distributional Projection: A Unified Interpretation of Transformer Architecture [0.0]
We show that self-attention emerges from projecting corpus-level co-occurrence statistics into sequence context.<n>Our analysis demonstrates that the Transformer architecture's particular algebraic form follows from these projection principles.
arXiv Detail & Related papers (2025-11-16T02:25:04Z)
CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization [51.716834831684004]
We study the problem of conformal prediction (CP) under geometric data shifts.<n>We propose integrating geometric information--such as geometric pose--into the conformal procedure to reinstate its guarantees.
arXiv Detail & Related papers (2025-06-19T10:12:02Z)
Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings [67.5600169375126]
We study the task of panoptic symbol spotting in computer-aided design (CAD) drawings composed of vector graphical primitives.<n>Existing methods typically rely on imageization, graph construction, or point-based representation.<n>We propose VecFormer, a novel method that addresses these challenges through line-based representation of primitives.
arXiv Detail & Related papers (2025-05-29T12:33:11Z)
Sparsification and Reconstruction from the Perspective of Representation Geometry [10.834177456685538]
Sparse Autoencoders (SAEs) have emerged as a predominant tool in mechanistic interpretability.<n>This study explains the principles of sparsity from the perspective of representational geometry.<n>Specifically emphasizes the necessity of understanding representations and incorporating representational constraints.
arXiv Detail & Related papers (2025-05-28T15:54:33Z)
Geometry-Editable and Appearance-Preserving Object Compositon [67.98806888489385]
General object composition (GOC) aims to seamlessly integrate a target object into a background scene with desired geometric properties.<n>Recent approaches derive semantic embeddings and integrate them into advanced diffusion models to enable geometry-editable generation.<n>We introduce a Disentangled Geometry-editable and Appearance-preserving Diffusion model that first leverages semantic embeddings to implicitly capture desired geometric transformations.
arXiv Detail & Related papers (2025-05-27T09:05:28Z)
Geometry of Semantics in Next-Token Prediction: How Optimization Implicitly Organizes Linguistic Representations [34.88156871518115]
Next-token prediction (NTP) optimization leads language models to extract and organize semantic structure from text.<n>We demonstrate that concepts corresponding to larger singular values are learned earlier during training, yielding a natural semantic hierarchy.<n>This insight motivates orthant-based clustering, a method that combines concept signs to identify interpretable semantic categories.
arXiv Detail & Related papers (2025-05-13T08:46:04Z)
Constrained belief updates explain geometric structures in transformer representations [1.1666234644810893]
We integrate the model-agnostic theory of optimal prediction with mechanistic interpretability to analyze transformers trained on a tractable family of hidden Markov models.<n>Our analysis focuses on single-layer transformers, revealing how the first attention layer implements constrained updates.<n>We show how both the algorithmic behavior and the underlying geometry of these representations can be theoretically predicted in detail.
arXiv Detail & Related papers (2025-02-04T03:03:54Z)
Relative Representations: Topological and Geometric Perspectives [50.85040046976025]
Relative representations are an established approach to zero-shot model stitching.<n>We introduce a normalization procedure in the relative transformation, resulting in invariance to non-isotropic rescalings and permutations.<n>Second, we propose to deploy topological densification when fine-tuning relative representations, a topological regularization loss encouraging clustering within classes.
arXiv Detail & Related papers (2024-09-17T08:09:22Z)
Sigma Flows for Image and Data Labeling and Learning Structured Prediction [5.875121114945721]
This paper introduces the sigma flow model for the prediction of structured labelings of data observed on Riemannian manifold.<n>The approach combines the Laplace-Beltrami framework for image denoising and enhancement, introduced by Sochen, Kimmel and Malladi about 25 years ago, and the assignment flow approach introduced and studied by the authors.
arXiv Detail & Related papers (2024-08-28T17:04:56Z)
Corner-to-Center Long-range Context Model for Efficient Learned Image Compression [70.0411436929495]
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations. We propose the textbfCorner-to-Center transformer-based Context Model (C$3$M) designed to enhance context and latent predictions. In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder.
arXiv Detail & Related papers (2023-11-29T21:40:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.