Identification of Mean-Field Dynamics using Transformers
- URL: http://arxiv.org/abs/2410.16295v1
- Date: Sun, 06 Oct 2024 19:47:24 GMT
- Title: Identification of Mean-Field Dynamics using Transformers
- Authors: Shiba Biswal, Karthik Elamvazhuthi, Rishi Sonthalia,
- Abstract summary: This paper investigates the use of transformer architectures to approximate the mean-field dynamics of particle systems exhibiting collective behavior.
Specifically, we prove that if a finite-dimensional transformer can effectively approximate the finite-dimensional vector field governing the particle system, then the expected output of this transformer provides a good approximation for the infinite-dimensional mean-field vector field.
- Score: 3.8916312075738273
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates the use of transformer architectures to approximate the mean-field dynamics of interacting particle systems exhibiting collective behavior. Such systems are fundamental in modeling phenomena across physics, biology, and engineering, including gas dynamics, opinion formation, biological networks, and swarm robotics. The key characteristic of these systems is that the particles are indistinguishable, leading to permutation-equivariant dynamics. We demonstrate that transformers, which inherently possess permutation equivariance, are well-suited for approximating these dynamics. Specifically, we prove that if a finite-dimensional transformer can effectively approximate the finite-dimensional vector field governing the particle system, then the expected output of this transformer provides a good approximation for the infinite-dimensional mean-field vector field. Leveraging this result, we establish theoretical bounds on the distance between the true mean-field dynamics and those obtained using the transformer. We validate our theoretical findings through numerical simulations on the Cucker-Smale model for flocking, and the mean-field system for training two-layer neural networks.
Related papers
- Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning [30.781578037476347]
We introduce a novel approach to modeling transformer architectures using highly flexible non-autonomous neural ordinary differential equations (ODEs)
Our proposed model parameterizes all weights of attention and feed-forward blocks through neural networks, expressing these weights as functions of a continuous layer index.
Our neural ODE transformer demonstrates performance comparable to or better than vanilla transformers across various configurations and datasets.
arXiv Detail & Related papers (2025-03-03T09:12:14Z) - Interpreting Affine Recurrence Learning in GPT-style Transformers [54.01174470722201]
In-context learning allows GPT-style transformers to generalize during inference without modifying their weights.
This paper focuses specifically on their ability to learn and predict affine recurrences as an ICL task.
We analyze the model's internal operations using both empirical and theoretical approaches.
arXiv Detail & Related papers (2024-10-22T21:30:01Z) - Transformers from Diffusion: A Unified Framework for Neural Message Passing [79.9193447649011]
Message passing neural networks (MPNNs) have become a de facto class of model solutions.<n>We propose an energy-constrained diffusion model, which integrates the inductive bias of diffusion with layer-wise constraints of energy.<n>Building on these insights, we devise a new class of message passing models, dubbed Transformers (DIFFormer), whose global attention layers are derived from the principled energy-constrained diffusion framework.
arXiv Detail & Related papers (2024-09-13T17:54:41Z) - Clustering in pure-attention hardmax transformers and its role in sentiment analysis [0.0]
We rigorously characterize the behavior of transformers with hardmax self-attention and normalization sublayers as the number of layers tends to infinity.
We show that the transformer inputsally converge to a clustered equilibrium determined by special points called leaders.
We then leverage this theoretical understanding to solve sentiment analysis problems from language processing using a fully interpretable transformer model.
arXiv Detail & Related papers (2024-06-26T16:13:35Z) - Dynamical Mean-Field Theory of Self-Attention Neural Networks [0.0]
Transformer-based models have demonstrated exceptional performance across diverse domains.
Little is known about how they operate or what are their expected dynamics.
We use methods for the study of asymmetric Hopfield networks in nonequilibrium regimes.
arXiv Detail & Related papers (2024-06-11T13:29:34Z) - Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory [11.3128832831327]
Increasing the size of a Transformer model does not always lead to enhanced performance.
improved generalization ability occurs as the model memorizes the training samples.
We present a theoretical framework that sheds light on the memorization process and performance dynamics of transformer-based language models.
arXiv Detail & Related papers (2024-05-14T15:48:36Z) - Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling [10.246977481606427]
We study the mechanisms through which different components of Transformer, such as the dot-product self-attention, affect its expressive power.
Our study reveals the roles of critical parameters in the Transformer, such as the number of layers and the number of attention heads.
arXiv Detail & Related papers (2024-02-01T11:43:13Z) - Supercharging Graph Transformers with Advective Diffusion [28.40109111316014]
This paper proposes Advective Diffusion Transformer (AdvDIFFormer), a physics-inspired graph Transformer model designed to address this challenge.<n>We show that AdvDIFFormer has provable capability for controlling generalization error with topological shifts.<n> Empirically, the model demonstrates superiority in various predictive tasks across information networks, molecular screening and protein interactions.
arXiv Detail & Related papers (2023-10-10T08:40:47Z) - Approximation and Estimation Ability of Transformers for
Sequence-to-Sequence Functions with Infinite Dimensional Input [50.83356836818667]
We study the approximation and estimation ability of Transformers as sequence-to-sequence functions with infinite dimensional inputs.
Our theoretical results support the practical success of Transformers for high dimensional data.
arXiv Detail & Related papers (2023-05-30T02:44:49Z) - Transformer with Implicit Edges for Particle-based Physics Simulation [135.77656965678196]
Transformer with Implicit Edges (TIE) captures the rich semantics of particle interactions in an edge-free manner.
We evaluate our model on diverse domains of varying complexity and materials.
arXiv Detail & Related papers (2022-07-22T03:45:29Z) - NeuroFluid: Fluid Dynamics Grounding with Particle-Driven Neural
Radiance Fields [65.07940731309856]
Deep learning has shown great potential for modeling the physical dynamics of complex particle systems such as fluids.
In this paper, we consider a partially observable scenario known as fluid dynamics grounding.
We propose a differentiable two-stage network named NeuroFluid.
It is shown to reasonably estimate the underlying physics of fluids with different initial shapes, viscosity, and densities.
arXiv Detail & Related papers (2022-03-03T15:13:29Z) - Learning stochastic dynamics and predicting emergent behavior using
transformers [0.0]
We show that a neural network can learn the dynamical rules of a system by observation of a single dynamical trajectory of the system.
We train a neural network called a transformer on a single trajectory of the model.
Transformers have the flexibility to learn dynamical rules from observation without explicit enumeration of rates or coarse-graining of configuration space.
arXiv Detail & Related papers (2022-02-17T15:27:21Z) - Neural-Network Quantum States for Periodic Systems in Continuous Space [66.03977113919439]
We introduce a family of neural quantum states for the simulation of strongly interacting systems in the presence of periodicity.
For one-dimensional systems we find very precise estimations of the ground-state energies and the radial distribution functions of the particles.
In two dimensions we obtain good estimations of the ground-state energies, comparable to results obtained from more conventional methods.
arXiv Detail & Related papers (2021-12-22T15:27:30Z) - Equivariant vector field network for many-body system modeling [65.22203086172019]
Equivariant Vector Field Network (EVFN) is built on a novel equivariant basis and the associated scalarization and vectorization layers.
We evaluate our method on predicting trajectories of simulated Newton mechanics systems with both full and partially observed data.
arXiv Detail & Related papers (2021-10-26T14:26:25Z) - Topographic VAEs learn Equivariant Capsules [84.33745072274942]
We introduce the Topographic VAE: a novel method for efficiently training deep generative models with topographically organized latent variables.
We show that such a model indeed learns to organize its activations according to salient characteristics such as digit class, width, and style on MNIST.
We demonstrate approximate equivariance to complex transformations, expanding upon the capabilities of existing group equivariant neural networks.
arXiv Detail & Related papers (2021-09-03T09:25:57Z) - Electric circuit emulation of topological transitions driven by quantum
statistics [0.0]
We predict the topological transition in the two-particle interacting system driven by the particles' quantum statistics.
As a toy model, we investigate an extended one-dimensional Hubbard model with two anyonic excitations obeying fractional quantum statistics.
We develop a rigorous method to emulate the eigenmodes and eigenenergies of anyon pairs with resonant electric circuits.
arXiv Detail & Related papers (2021-08-23T22:34:52Z) - Dynamics of two-dimensional open quantum lattice models with tensor
networks [0.0]
We develop a tensor network method, based on an infinite Projected Entangled Pair Operator (iPEPO) ansatz, applicable directly in the thermodynamic limit.
We consider dissipative transverse quantum Ising and driven-dissipative hard core boson models in non-mean field limits.
Our method enables to study regimes which are accessible to current experiments but lie well beyond the applicability of existing techniques.
arXiv Detail & Related papers (2020-12-22T18:24:20Z) - Variational Transformers for Diverse Response Generation [71.53159402053392]
Variational Transformer (VT) is a variational self-attentive feed-forward sequence model.
VT combines the parallelizability and global receptive field computation of the Transformer with the variational nature of the CVAE.
We explore two types of VT: 1) modeling the discourse-level diversity with a global latent variable; and 2) augmenting the Transformer decoder with a sequence of finegrained latent variables.
arXiv Detail & Related papers (2020-03-28T07:48:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.