Emergence of meta-stable clustering in mean-field transformer models
- URL: http://arxiv.org/abs/2410.23228v1
- Date: Wed, 30 Oct 2024 17:16:38 GMT
- Title: Emergence of meta-stable clustering in mean-field transformer models
- Authors: Giuseppe Bruno, Federico Pasqualotto, Andrea Agazzi,
- Abstract summary: We model the evolution of tokens within a deep stack of Transformer layers as a continuous-time flow on the unit sphere.
We focus on the emergence and persistence of meta-stable phases and clustering phenomena, key elements in applications like next-token prediction.
- Score: 1.6385815610837167
- License:
- Abstract: We model the evolution of tokens within a deep stack of Transformer layers as a continuous-time flow on the unit sphere, governed by a mean-field interacting particle system, building on the framework introduced in (Geshkovski et al., 2023). Studying the corresponding mean-field Partial Differential Equation (PDE), which can be interpreted as a Wasserstein gradient flow, in this paper we provide a mathematical investigation of the long-term behavior of this system, with a particular focus on the emergence and persistence of meta-stable phases and clustering phenomena, key elements in applications like next-token prediction. More specifically, we perform a perturbative analysis of the mean-field PDE around the iid uniform initialization and prove that, in the limit of large number of tokens, the model remains close to a meta-stable manifold of solutions with a given structure (e.g., periodicity). Further, the structure characterizing the meta-stable manifold is explicitly identified, as a function of the inverse temperature parameter of the model, by the index maximizing a certain rescaling of Gegenbauer polynomials.
Related papers
- Nonperturbative features in the Lie-algebraic Kähler sigma model with fermions [0.0]
We investigate a quantum mechanical system originating from a Lie-algebraic K"ahler sigma model with multiple right-handed chiral fermions.
We identify and analyze saddle point solutions and examine their contributions within the perturbative expansions of the ground state energy.
We propose that the elongation parameter becomes relevant in shaping the system's quantum behavior from the three-loop level.
arXiv Detail & Related papers (2024-12-16T04:55:14Z) - Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis [56.442307356162864]
We study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework.
We introduce a discrete-time sampling algorithm in the general state space $[S]d$ that utilizes score estimators at predefined time points.
Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function.
arXiv Detail & Related papers (2024-10-03T09:07:13Z) - Latent Space Energy-based Neural ODEs [73.01344439786524]
This paper introduces novel deep dynamical models designed to represent continuous-time sequences.
We train the model using maximum likelihood estimation with Markov chain Monte Carlo.
Experimental results on oscillating systems, videos and real-world state sequences (MuJoCo) demonstrate that our model with the learnable energy-based prior outperforms existing counterparts.
arXiv Detail & Related papers (2024-09-05T18:14:22Z) - Geometric Neural Diffusion Processes [55.891428654434634]
We extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling.
We show that with these conditions, the generative functional model admits the same symmetry.
arXiv Detail & Related papers (2023-07-11T16:51:38Z) - Manifold Diffusion Fields [11.4726574705951]
We present an approach that unlocks learning of diffusion models of data in non-Euclidean geometries.
We define an intrinsic coordinate system on the manifold via the eigen-functions of the Laplace-Beltrami Operator.
We show that MDF can capture distributions of such functions with better diversity and fidelity than previous approaches.
arXiv Detail & Related papers (2023-05-24T21:42:45Z) - Multielement polynomial chaos Kriging-based metamodelling for Bayesian
inference of non-smooth systems [0.0]
This paper presents a surrogate modelling technique based on domain partitioning for Bayesian parameter inference of highly nonlinear engineering models.
The developed surrogate model combines in a piecewise function an array of local Polynomial Chaos based Kriging metamodels constructed on a finite set of non-overlapping of the input space.
The efficiency and accuracy of the proposed approach are validated through two case studies, including an analytical benchmark and a numerical case study.
arXiv Detail & Related papers (2022-12-05T13:22:39Z) - Counting Phases and Faces Using Bayesian Thermodynamic Integration [77.34726150561087]
We introduce a new approach to reconstruction of the thermodynamic functions and phase boundaries in two-parametric statistical mechanics systems.
We use the proposed approach to accurately reconstruct the partition functions and phase diagrams of the Ising model and the exactly solvable non-equilibrium TASEP.
arXiv Detail & Related papers (2022-05-18T17:11:23Z) - Multiway Ensemble Kalman Filter [9.0932688770957]
We study the emergence of sparsity and multiway structures in second-order statistical characterizations of dynamical processes governed by partial differential equations (PDEs)
We show that multiway data generated from the Poisson and the convection-diffusion types of PDEs can be accurately tracked via the ensemble Kalman filter (EnKF)
arXiv Detail & Related papers (2021-12-08T15:04:34Z) - Determination of the critical exponents in dissipative phase
transitions: Coherent anomaly approach [51.819912248960804]
We propose a generalization of the coherent anomaly method to extract the critical exponents of a phase transition occurring in the steady-state of an open quantum many-body system.
arXiv Detail & Related papers (2021-03-12T13:16:18Z) - Out-of-time-order correlations and the fine structure of eigenstate
thermalisation [58.720142291102135]
Out-of-time-orderors (OTOCs) have become established as a tool to characterise quantum information dynamics and thermalisation.
We show explicitly that the OTOC is indeed a precise tool to explore the fine details of the Eigenstate Thermalisation Hypothesis (ETH)
We provide an estimation of the finite-size scaling of $omega_textrmGOE$ for the general class of observables composed of sums of local operators in the infinite-temperature regime.
arXiv Detail & Related papers (2021-03-01T17:51:46Z) - Towards quantum simulation of Sachdev-Ye-Kitaev model [5.931069258860319]
We study a simplified version of the Sachdev-Ye-Kitaev (SYK) model with real interactions by exact diagonalization.
A quantum phase transition from a chaotic state to an integrable state is observed by increasing the discrete separation.
arXiv Detail & Related papers (2020-03-03T14:18:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.