Related papers: The Shape of Beliefs: Geometry, Dynamics, and Interventions along Representation Manifolds of Language Models' Posteriors

The Shape of Beliefs: Geometry, Dynamics, and Interventions along Representation Manifolds of Language Models' Posteriors

URL: http://arxiv.org/abs/2602.02315v1
Date: Mon, 02 Feb 2026 16:45:05 GMT
Title: The Shape of Beliefs: Geometry, Dynamics, and Interventions along Representation Manifolds of Language Models' Posteriors
Authors: Raphaël Sarfati, Eric Bigelow, Daniel Wurgaft, Jack Merullo, Atticus Geiger, Owen Lewis, Tom McGrath, Ekdeep Singh Lubana,
Abstract summary: Large language models (LLMs) represent prompt-conditioned beliefs (posteriors over answers and claims)<n>We study a controlled setting in which Llama-3.2 generates samples from a normal distribution by implicitly inferring its parameters.<n>We find representations of curved "belief manifold" for these parameters form with sufficient in-context learning.
Score: 24.477029700560113
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) represent prompt-conditioned beliefs (posteriors over answers and claims), but we lack a mechanistic account of how these beliefs are encoded in representation space, how they update with new evidence, and how interventions reshape them. We study a controlled setting in which Llama-3.2 generates samples from a normal distribution by implicitly inferring its parameters (mean and standard deviation) given only samples from the distribution in context. We find representations of curved "belief manifolds" for these parameters form with sufficient in-context learning and study how the model adapts when the distribution suddenly changes. While standard linear steering often pushes the model off-manifold and induces coupled, out-of-distribution shifts, geometry and field-aware steering better preserves the intended belief family. Our work demonstrates an example of linear field probing (LFP) as a simple approach to tile the data manifold and make interventions that respect the underlying geometry. We conclude that rich structure emerges naturally in LLMs and that purely linear concept representations are often an inadequate abstraction.

Related papers

Probing the Geometry of Diffusion Models with the String Method [13.771271465889432]
We introduce a framework based on the string method that computes continuous paths between samples by evolving curves under the learned score function.<n> operating on pretrained models without retraining, our approach interpolates between three regimes.<n>For image diffusion models, MEPs contain high-likelihood but unrealistic ''cartoon'' images, confirming prior observations that likelihood maxima appear unrealistic.<n>For protein structure prediction, our method computes transition pathways between metastable conformers directly from models trained on static structures.
arXiv Detail & Related papers (2026-02-25T17:10:59Z)
Manifold-Aware Perturbations for Constrained Generative Modeling [1.6431177510318926]
We develop a computationally cheap, mathematically justified, and highly flexible distributional modification for combating known pitfalls in equality-constrained generative models.<n>We show that our approach consistently enables data distribution recovery and stable sampling with both diffusion models and normalizing flows.
arXiv Detail & Related papers (2026-01-30T16:34:33Z)
Causal Manifold Fairness: Enforcing Geometric Invariance in Representation Learning [0.0]
We introduce Causal Manifold Fairness (CMF), a novel framework that bridges causal inference and geometric deep learning.<n>By enforcing constraints on the Jacobian and Hessian of the decoder, CMF ensures that the rules of the latent space are preserved across demographic groups.<n>We validate CMF on synthetic Structural Causal Models (SCMs), demonstrating that it effectively disentangles sensitive geometric warping while preserving task utility.
arXiv Detail & Related papers (2026-01-06T14:05:22Z)
Calibrating Biased Distribution in VFM-derived Latent Space via Cross-Domain Geometric Consistency [52.52950138164424]
We show that when leveraging the off-the-shelf (vision) foundation models for feature extraction, the geometric shapes of the resulting feature distributions exhibit remarkable transferability across domains and datasets.<n>We embody our geometric knowledge-guided distribution calibration framework in two popular and challenging settings: federated learning and long-tailed recognition.<n>In long-tailed learning, it utilizes the geometric knowledge transferred from sample-rich categories to recover the true distribution for sample-scarce tail classes.
arXiv Detail & Related papers (2025-08-19T05:22:59Z)
From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs [3.6485741522018724]
Large Language Models (LLMs) exhibit strong conversational abilities but often generate falsehoods.<n>We extend the concept cone framework, recently introduced for modeling refusal, to the domain of truth.<n>We identify multi-dimensional cones that causally mediate truth-related behavior across multiple LLM families.
arXiv Detail & Related papers (2025-05-27T22:14:54Z)
On the Origins of Linear Representations in Large Language Models [51.88404605700344]
We introduce a simple latent variable model to formalize the concept dynamics of the next token prediction. Experiments show that linear representations emerge when learning from data matching the latent variable model. We additionally confirm some predictions of the theory using the LLaMA-2 large language model.
arXiv Detail & Related papers (2024-03-06T17:17:36Z)
Flow Factorized Representation Learning [109.51947536586677]
We introduce a generative model which specifies a distinct set of latent probability paths that define different input transformations. We show that our model achieves higher likelihoods on standard representation learning benchmarks while simultaneously being closer to approximately equivariant models.
arXiv Detail & Related papers (2023-09-22T20:15:37Z)
DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion [66.21290235237808]
We introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks.
arXiv Detail & Related papers (2023-01-23T15:18:54Z)
PAC Generalization via Invariant Representations [41.02828564338047]
We consider the notion of $epsilon$-approximate invariance in a finite sample setting. Inspired by PAC learning, we obtain finite-sample out-of-distribution generalization guarantees. Our results show bounds that do not scale in ambient dimension when intervention sites are restricted to lie in a constant size subset of in-degree bounded nodes.
arXiv Detail & Related papers (2022-05-30T15:50:14Z)
GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning [54.291331971813364]
offline reinforcement learning approaches can be divided into proximal and uncertainty-aware methods. In this work, we demonstrate the benefit of combining the two in a latent variational model. Our proposed metrics measure both the quality of out of distribution samples as well as the discrepancy of examples in the data.
arXiv Detail & Related papers (2021-02-22T19:42:40Z)
Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images. In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner. We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.