Emergence of meta-stable clustering in mean-field transformer models
- URL: http://arxiv.org/abs/2410.23228v1
- Date: Wed, 30 Oct 2024 17:16:38 GMT
- Title: Emergence of meta-stable clustering in mean-field transformer models
- Authors: Giuseppe Bruno, Federico Pasqualotto, Andrea Agazzi,
- Abstract summary: We model the evolution of tokens within a deep stack of Transformer layers as a continuous-time flow on the unit sphere.
We focus on the emergence and persistence of meta-stable phases and clustering phenomena, key elements in applications like next-token prediction.
- Score: 1.6385815610837167
- License:
- Abstract: We model the evolution of tokens within a deep stack of Transformer layers as a continuous-time flow on the unit sphere, governed by a mean-field interacting particle system, building on the framework introduced in (Geshkovski et al., 2023). Studying the corresponding mean-field Partial Differential Equation (PDE), which can be interpreted as a Wasserstein gradient flow, in this paper we provide a mathematical investigation of the long-term behavior of this system, with a particular focus on the emergence and persistence of meta-stable phases and clustering phenomena, key elements in applications like next-token prediction. More specifically, we perform a perturbative analysis of the mean-field PDE around the iid uniform initialization and prove that, in the limit of large number of tokens, the model remains close to a meta-stable manifold of solutions with a given structure (e.g., periodicity). Further, the structure characterizing the meta-stable manifold is explicitly identified, as a function of the inverse temperature parameter of the model, by the index maximizing a certain rescaling of Gegenbauer polynomials.
Related papers
- Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis [56.442307356162864]
We study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework.
We introduce a discrete-time sampling algorithm in the general state space $[S]d$ that utilizes score estimators at predefined time points.
Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function.
arXiv Detail & Related papers (2024-10-03T09:07:13Z) - Latent Space Energy-based Neural ODEs [73.01344439786524]
This paper introduces a novel family of deep dynamical models designed to represent continuous-time sequence data.
We train the model using maximum likelihood estimation with Markov chain Monte Carlo.
Experiments on oscillating systems, videos and real-world state sequences (MuJoCo) illustrate that ODEs with the learnable energy-based prior outperform existing counterparts.
arXiv Detail & Related papers (2024-09-05T18:14:22Z) - Topological complexity of spiked random polynomials and finite-rank
spherical integrals [2.1756081703276]
In particular, we establish variational formulas for the exponentials of the average number of total critical points and the determinants of local parameters of a finite-rank spiked Gaussian Wigner matrix.
The analysis is based on recent advances on finite-rank spherical integrals by [Guionnet, Husson] to study the large deviations of multi-rank spiked Gaussian Wigner matrices.
There is an exact threshold for the external parameters such that, once exceeded, the complexity function vanishes into new regions in which the critical points are close to the given vectors.
arXiv Detail & Related papers (2023-12-19T16:52:01Z) - Geometric Neural Diffusion Processes [55.891428654434634]
We extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling.
We show that with these conditions, the generative functional model admits the same symmetry.
arXiv Detail & Related papers (2023-07-11T16:51:38Z) - Manifold Diffusion Fields [11.4726574705951]
We present an approach that unlocks learning of diffusion models of data in non-Euclidean geometries.
We define an intrinsic coordinate system on the manifold via the eigen-functions of the Laplace-Beltrami Operator.
We show that MDF can capture distributions of such functions with better diversity and fidelity than previous approaches.
arXiv Detail & Related papers (2023-05-24T21:42:45Z) - Multielement polynomial chaos Kriging-based metamodelling for Bayesian
inference of non-smooth systems [0.0]
This paper presents a surrogate modelling technique based on domain partitioning for Bayesian parameter inference of highly nonlinear engineering models.
The developed surrogate model combines in a piecewise function an array of local Polynomial Chaos based Kriging metamodels constructed on a finite set of non-overlapping of the input space.
The efficiency and accuracy of the proposed approach are validated through two case studies, including an analytical benchmark and a numerical case study.
arXiv Detail & Related papers (2022-12-05T13:22:39Z) - Counting Phases and Faces Using Bayesian Thermodynamic Integration [77.34726150561087]
We introduce a new approach to reconstruction of the thermodynamic functions and phase boundaries in two-parametric statistical mechanics systems.
We use the proposed approach to accurately reconstruct the partition functions and phase diagrams of the Ising model and the exactly solvable non-equilibrium TASEP.
arXiv Detail & Related papers (2022-05-18T17:11:23Z) - Multiway Ensemble Kalman Filter [9.0932688770957]
We study the emergence of sparsity and multiway structures in second-order statistical characterizations of dynamical processes governed by partial differential equations (PDEs)
We show that multiway data generated from the Poisson and the convection-diffusion types of PDEs can be accurately tracked via the ensemble Kalman filter (EnKF)
arXiv Detail & Related papers (2021-12-08T15:04:34Z) - Determination of the critical exponents in dissipative phase
transitions: Coherent anomaly approach [51.819912248960804]
We propose a generalization of the coherent anomaly method to extract the critical exponents of a phase transition occurring in the steady-state of an open quantum many-body system.
arXiv Detail & Related papers (2021-03-12T13:16:18Z) - Out-of-time-order correlations and the fine structure of eigenstate
thermalisation [58.720142291102135]
Out-of-time-orderors (OTOCs) have become established as a tool to characterise quantum information dynamics and thermalisation.
We show explicitly that the OTOC is indeed a precise tool to explore the fine details of the Eigenstate Thermalisation Hypothesis (ETH)
We provide an estimation of the finite-size scaling of $omega_textrmGOE$ for the general class of observables composed of sums of local operators in the infinite-temperature regime.
arXiv Detail & Related papers (2021-03-01T17:51:46Z) - Towards quantum simulation of Sachdev-Ye-Kitaev model [5.931069258860319]
We study a simplified version of the Sachdev-Ye-Kitaev (SYK) model with real interactions by exact diagonalization.
A quantum phase transition from a chaotic state to an integrable state is observed by increasing the discrete separation.
arXiv Detail & Related papers (2020-03-03T14:18:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.