Quantitative Clustering in Mean-Field Transformer Models
- URL: http://arxiv.org/abs/2504.14697v2
- Date: Wed, 30 Apr 2025 13:35:39 GMT
- Title: Quantitative Clustering in Mean-Field Transformer Models
- Authors: Shi Chen, Zhengjiang Lin, Yury Polyanskiy, Philippe Rigollet,
- Abstract summary: The evolution of tokens through a deep transformer models can be modeled as an interacting particle system.<n>We investigate the long-time clustering of mean-field transformer models.
- Score: 32.46389492080837
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The evolution of tokens through a deep transformer models can be modeled as an interacting particle system that has been shown to exhibit an asymptotic clustering behavior akin to the synchronization phenomenon in Kuramoto models. In this work, we investigate the long-time clustering of mean-field transformer models. More precisely, we establish exponential rates of contraction to a Dirac point mass for any suitably regular initialization under some assumptions on the parameters of transformer models, any suitably regular mean-field initialization synchronizes exponentially fast with some quantitative rates.
Related papers
- Scalable Equilibrium Sampling with Sequential Boltzmann Generators [60.00515282300297]
We extend the Boltzmann generator framework and introduce Sequential Boltzmann generators with two key improvements.
The first is a highly efficient non-equivariant Transformer-based normalizing flow operating directly on all-atom Cartesian coordinates.
We demonstrate the first equilibrium sampling in Cartesian coordinates of tri, tetra, and hexapeptides that were so far intractable for prior Boltzmann generators.
arXiv Detail & Related papers (2025-02-25T18:59:13Z) - Transformers and Their Roles as Time Series Foundation Models [14.61139607588868]
We analyze transformers as time series foundation models, focusing on their approximation and generalization capabilities.<n>We prove that MOIRAI is capable of automatically fitting autoregressive models with an arbitrary number of covariates.<n>Experiments support our theoretical findings, highlighting the efficacy of transformers as time series foundation models.
arXiv Detail & Related papers (2025-02-05T17:18:55Z) - Latent Space Energy-based Neural ODEs [73.01344439786524]
This paper introduces novel deep dynamical models designed to represent continuous-time sequences.<n>We train the model using maximum likelihood estimation with Markov chain Monte Carlo.<n> Experimental results on oscillating systems, videos and real-world state sequences (MuJoCo) demonstrate that our model with the learnable energy-based prior outperforms existing counterparts.
arXiv Detail & Related papers (2024-09-05T18:14:22Z) - Probabilistic Topic Modelling with Transformer Representations [0.9999629695552195]
We propose the Transformer-Representation Neural Topic Model (TNTM)
This approach unifies the powerful and versatile notion of topics based on transformer embeddings with fully probabilistic modelling.
Experimental results show that our proposed model achieves results on par with various state-of-the-art approaches in terms of embedding coherence.
arXiv Detail & Related papers (2024-03-06T14:27:29Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - Investigating Recurrent Transformers with Dynamic Halt [64.862738244735]
We study the inductive biases of two major approaches to augmenting Transformers with a recurrent mechanism.<n>We propose and investigate novel ways to extend and combine the methods.
arXiv Detail & Related papers (2024-02-01T19:47:31Z) - Quantum Effects on the Synchronization Dynamics of the Kuramoto Model [62.997667081978825]
We show that quantum fluctuations hinder the emergence of synchronization, albeit not entirely suppressing it.
We derive an analytical expression for the critical coupling, highlighting its dependence on the model parameters.
arXiv Detail & Related papers (2023-06-16T16:41:16Z) - Modeling the space-time correlation of pulsed twin beams [68.8204255655161]
Entangled twin-beams generated by parametric down-conversion are among the favorite sources for imaging-oriented applications.
We propose a semi-analytic model which aims to bridge the gap between time-consuming numerical simulations and the unrealistic plane-wave pump theory.
arXiv Detail & Related papers (2023-01-18T11:29:49Z) - Simulating Heisenberg Interactions in the Ising Model with Strong Drive
Fields [0.0]
An Ising model with large driving fields over discrete time intervals is shown to be reproduced by an effective XXZ-Heisenberg model.
For specific orientations of the drive field, the dynamics of the XXX-Heisenberg model is reproduced.
arXiv Detail & Related papers (2022-07-19T17:52:31Z) - Topographic VAEs learn Equivariant Capsules [84.33745072274942]
We introduce the Topographic VAE: a novel method for efficiently training deep generative models with topographically organized latent variables.
We show that such a model indeed learns to organize its activations according to salient characteristics such as digit class, width, and style on MNIST.
We demonstrate approximate equivariance to complex transformations, expanding upon the capabilities of existing group equivariant neural networks.
arXiv Detail & Related papers (2021-09-03T09:25:57Z) - Emergent fractal phase in energy stratified random models [0.0]
We study the effects of partial correlations in kinetic hopping terms of long-range random matrix models on their localization properties.
We show that any deviation from the completely correlated case leads to the emergent non-ergodic delocalization in the system.
arXiv Detail & Related papers (2021-06-07T18:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.