Related papers: Emergent Manifold Separability during Reasoning in Large Language Models

Emergent Manifold Separability during Reasoning in Large Language Models

URL: http://arxiv.org/abs/2602.20338v1
Date: Mon, 23 Feb 2026 20:36:17 GMT
Title: Emergent Manifold Separability during Reasoning in Large Language Models
Authors: Alexandre Polo, Chanwoo Chun, SueYeon Chung,
Abstract summary: Chain-of-Thought prompting significantly improves reasoning in Large Language Models.<n>We quantify the linear separability of latent representations without the confounding factors of probe training.
Score: 46.78826734548872
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Chain-of-Thought (CoT) prompting significantly improves reasoning in Large Language Models, yet the temporal dynamics of the underlying representation geometry remain poorly understood. We investigate these dynamics by applying Manifold Capacity Theory (MCT) to a compositional Boolean logic task, allowing us to quantify the linear separability of latent representations without the confounding factors of probe training. Our analysis reveals that reasoning manifests as a transient geometric pulse, where concept manifolds are untangled into linearly separable subspaces immediately prior to computation and rapidly compressed thereafter. This behavior diverges from standard linear probe accuracy, which remains high long after computation, suggesting a fundamental distinction between information that is merely retrievable and information that is geometrically prepared for processing. We interpret this phenomenon as \emph{Dynamic Manifold Management}, a mechanism where the model dynamically modulates representational capacity to optimize the bandwidth of the residual stream throughout the reasoning chain.

Related papers

Structure and Redundancy in Large Language Models: A Spectral Study via Random Matrix Theory [0.0]
This thesis addresses two persistent and closely related challenges in modern deep learning, reliability and efficiency.<n>By analyzing the eigenvalue dynamics of hidden activations across layers and inputs, this work shows that spectral statistics provide a compact, stable, and interpretable lens on model behavior.<n>Within this framework, the first contribution, EigenTrack, introduces a real-time method for detecting hallucinations and out-of-distribution behavior in large language and vision-language models.<n>The second contribution, RMT-KD, presents a principled approach to compressing deep networks via random matrix theoretic knowledge distillation.
arXiv Detail & Related papers (2026-02-25T19:11:56Z)
KoopGen: Koopman Generator Networks for Representing and Predicting Dynamical Systems with Continuous Spectra [65.11254608352982]
We introduce a generator-based neural Koopman framework that models dynamics through a structured, state-dependent representation of Koopman generators.<n>By exploiting the intrinsic Cartesian decomposition into skew-adjoint and self-adjoint components, KoopGen separates conservative transport from irreversible dissipation.
arXiv Detail & Related papers (2026-02-15T06:32:23Z)
Backpropagation as Physical Relaxation: Exact Gradients in Finite Time [0.0]
''Dyadic Backpropagation'' is the foundational algorithm for training neural networks.<n>We show it emerges exactly as the finite-time relaxation of a physical dynamical system.<n>We prove that unit-step Euler discretization, the natural timescale of layer transitions, recovers standard backpropagation exactly in precisely 2L steps.
arXiv Detail & Related papers (2026-02-02T16:21:05Z)
A Critical Assessment of Pattern Comparisons Between POD and Autoencoders in Intraventricular Flows [4.123458880886283]
We show that Autoencoder (AE) models can reproduce POD-like coherent structures under specific latent-space configurations.<n>Overall, the results indicate that AEs can reproduce POD-like coherent structures under specific latent-space configurations.
arXiv Detail & Related papers (2025-12-22T13:21:11Z)
Spatially-informed transformers: Injecting geostatistical covariance biases into self-attention for spatio-temporal forecasting [0.0]
We propose a hybrid architecture that injects a geostatistic inductive bias directly into the decomposing self-attention mechanism via a learnable costatistics kernel.<n>We demonstrate the phenomenon of Deep Variography'', where the network successfully recovers the true spatial parameters of the underlying process end-to-end via backpropagation.
arXiv Detail & Related papers (2025-12-19T15:32:24Z)
Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought [64.43689151961054]
We theoretically analyze the training dynamics of a simplified two-layer transformer on the directed graph reachability problem.<n>Our analysis reveals that during training using continuous thought, the index-matching logit will first increase and then remain bounded under mild assumptions.
arXiv Detail & Related papers (2025-09-27T15:23:46Z)
Emergence of Quantised Representations Isolated to Anisotropic Functions [0.0]
This paper presents a novel methodology for determining representational structure, which builds upon the existing Spotlight Resonance method.<n>It shows how discrete representations can emerge and organise in autoencoder models, through a controlled ablation study in which only the activation function is altered.<n>Using this technique, the validity of whether function-driven symmetries can act as implicit inductive biases on representations is determined.
arXiv Detail & Related papers (2025-07-16T09:27:54Z)
Transformers Are Universally Consistent [14.904264782690639]
We show that Transformers equipped with softmax-based nonlinear attention are uniformly consistent when tasked with executing Least Squares regression.<n>We derive upper bounds on the empirical error which, in the regime, decay at a provable rate of $mathcalO(t-1/2d)$, where $t$ denotes the number of input tokens and $d$ the embedding dimensionality.
arXiv Detail & Related papers (2025-05-30T12:39:26Z)
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [76.15163242945813]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.<n>We introduce a novel generative model that generates tokens on the basis of human-interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z)
Convex Analysis of the Mean Field Langevin Dynamics [49.66486092259375]
convergence rate analysis of the mean field Langevin dynamics is presented. $p_q$ associated with the dynamics allows us to develop a convergence theory parallel to classical results in convex optimization.
arXiv Detail & Related papers (2022-01-25T17:13:56Z)
Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear. We show that it commonly arises in parameters of discrete multiplicative noise due to variance. A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.