Related papers: Geometric Scaling of Bayesian Inference in LLMs

Geometric Scaling of Bayesian Inference in LLMs

URL: http://arxiv.org/abs/2512.23752v1
Date: Sat, 27 Dec 2025 05:29:55 GMT
Title: Geometric Scaling of Bayesian Inference in LLMs
Authors: Naman Aggarwal, Siddhartha R. Dalal, Vishal Misra,
Abstract summary: Recent work has shown that small transformers trained in controlled "wind-tunnel'' settings can implement exact Bayesian inference.<n>We investigate whether this geometric signature persists in production-grade language models.
Score: 0.4779196219827507
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent work has shown that small transformers trained in controlled "wind-tunnel'' settings can implement exact Bayesian inference, and that their training dynamics produce a geometric substrate -- low-dimensional value manifolds and progressively orthogonal keys -- that encodes posterior structure. We investigate whether this geometric signature persists in production-grade language models. Across Pythia, Phi-2, Llama-3, and Mistral families, we find that last-layer value representations organize along a single dominant axis whose position strongly correlates with predictive entropy, and that domain-restricted prompts collapse this structure into the same low-dimensional manifolds observed in synthetic settings. To probe the role of this geometry, we perform targeted interventions on the entropy-aligned axis of Pythia-410M during in-context learning. Removing or perturbing this axis selectively disrupts the local uncertainty geometry, whereas matched random-axis interventions leave it intact. However, these single-layer manipulations do not produce proportionally specific degradation in Bayesian-like behavior, indicating that the geometry is a privileged readout of uncertainty rather than a singular computational bottleneck. Taken together, our results show that modern language models preserve the geometric substrate that enables Bayesian inference in wind tunnels, and organize their approximate Bayesian updates along this substrate.

Related papers

A Geometry-Adaptive Deep Variational Framework for Phase Discovery in the Landau-Brazovskii Model [4.702925112226925]
We propose a Geometry-Adaptive Deep Variational Framework (GeoDVF) for pattern-forming systems.<n>By explicitly treating the domain size as trainable variables within the variational formulation, GeoDVF naturally eliminates artificial stress during training.<n>We show that GeoDVF provides a robust and geometry-consistent variational solver capable of identifying both stable and metastable states without prior knowledge.
arXiv Detail & Related papers (2026-03-05T13:32:23Z)
The Bayesian Geometry of Transformer Attention [0.4779196219827507]
We build controlled environments where the true posterior is known in closed form and memorization is provably impossible.<n>Small transformers reproduce Bayesian posteriors with mbox$10-3$--$10-4$ bit accuracy, while capacity-matched geometrics fail by orders of magnitude.
arXiv Detail & Related papers (2025-12-27T05:28:58Z)
Manifold Percolation: from generative model to Reinforce learning [0.26905021039717986]
Generative modeling is typically framed as learning mapping rules, but from an observer's perspective without access to these rules, the task becomes disentangling the geometric support from the probability distribution.<n>We propose that continuum percolation is uniquely suited to this support analysis, as the sampling process effectively projects high-dimensional density estimation onto a geometric counting problem on the support.
arXiv Detail & Related papers (2025-11-25T17:12:42Z)
VIKING: Deep variational inference with stochastic projections [48.946143517489496]
Variational mean field approximations tend to struggle with contemporary overparametrized deep neural networks.<n>We propose a simple variational family that considers two independent linear subspaces of the parameter space.<n>This allows us to build a fully-correlated approximate posterior reflecting the overparametrization.
arXiv Detail & Related papers (2025-10-27T15:38:35Z)
Geometry-Aware Backdoor Attacks: Leveraging Curvature in Hyperbolic Embeddings [3.8806403512213787]
Non-Euclidean foundation models place representations in curved spaces such as hyperbolic geometry.<n>Small input changes appear subtle to standard input-space detectors but produce disproportionately large shifts in the model's representation space.<n>We propose a geometry-adaptive trigger and evaluate it across tasks and architectures.
arXiv Detail & Related papers (2025-10-07T19:24:43Z)
Generalized Linear Mode Connectivity for Transformers [87.32299363530996]
A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths.<n>Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope.<n>We introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, transformations, and general invertible maps.<n>This generalization enables, for the first time, the discovery of low- and zero-barrier linear paths between independently trained Vision Transformers and GPT-2 models.
arXiv Detail & Related papers (2025-06-28T01:46:36Z)
CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization [51.716834831684004]
We study the problem of conformal prediction (CP) under geometric data shifts.<n>We propose integrating geometric information--such as geometric pose--into the conformal procedure to reinstate its guarantees.
arXiv Detail & Related papers (2025-06-19T10:12:02Z)
Bayesian Circular Regression with von Mises Quasi-Processes [57.88921637944379]
In this work we explore a family of expressive and interpretable distributions over circle-valued random functions.<n>For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Gibbs sampling.<n>We present experiments applying this model to the prediction of wind directions and the percentage of the running gait cycle as a function of joint angles.
arXiv Detail & Related papers (2024-06-19T01:57:21Z)
Geometric Neural Diffusion Processes [55.891428654434634]
We extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling. We show that with these conditions, the generative functional model admits the same symmetry.
arXiv Detail & Related papers (2023-07-11T16:51:38Z)
A Unifying and Canonical Description of Measure-Preserving Diffusions [60.59592461429012]
A complete recipe of measure-preserving diffusions in Euclidean space was recently derived unifying several MCMC algorithms into a single framework. We develop a geometric theory that improves and generalises this construction to any manifold.
arXiv Detail & Related papers (2021-05-06T17:36:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.