Cascade of phase transitions in the training of Energy-based models
- URL: http://arxiv.org/abs/2405.14689v3
- Date: Fri, 08 Nov 2024 09:37:14 GMT
- Title: Cascade of phase transitions in the training of Energy-based models
- Authors: Dimitrios Bachtis, Giulio Biroli, Aurélien Decelle, Beatriz Seoane,
- Abstract summary: We investigate the feature encoding process in a prototypical energy-based generative model, the Bernoulli-Bernoulli RBM.
Our study tracks the evolution of the model's weight matrix through its singular value decomposition.
We validate our theoretical results by training the Bernoulli-Bernoulli RBM on real data sets.
- Score: 9.945465034701288
- License:
- Abstract: In this paper, we investigate the feature encoding process in a prototypical energy-based generative model, the Restricted Boltzmann Machine (RBM). We start with an analytical investigation using simplified architectures and data structures, and end with numerical analysis of real trainings on real datasets. Our study tracks the evolution of the model's weight matrix through its singular value decomposition, revealing a series of phase transitions associated to a progressive learning of the principal modes of the empirical probability distribution. The model first learns the center of mass of the modes and then progressively resolve all modes through a cascade of phase transitions. We first describe this process analytically in a controlled setup that allows us to study analytically the training dynamics. We then validate our theoretical results by training the Bernoulli-Bernoulli RBM on real data sets. By using data sets of increasing dimension, we show that learning indeed leads to sharp phase transitions in the high-dimensional limit. Moreover, we propose and test a mean-field finite-size scaling hypothesis. This shows that the first phase transition is in the same universality class of the one we studied analytically, and which is reminiscent of the mean-field paramagnetic-to-ferromagnetic phase transition.
Related papers
- Learning Pore-scale Multi-phase Flow from Experimental Data with Graph Neural Network [2.2101344151283944]
Current numerical models are often incapable of accurately capturing the complex pore-scale physics observed in experiments.
We propose a graph neural network-based approach and directly learn pore-scale fluid flow using micro-CT experimental data.
arXiv Detail & Related papers (2024-11-21T15:01:17Z) - Learning and Transferring Sparse Contextual Bigrams with Linear Transformers [47.37256334633102]
We introduce the Sparse Con Bigram model, where the next token's generation depends on a sparse set of earlier positions determined by the last token.
We analyze the training dynamics and sample complexity of learning SCB using a one-layer linear transformer with a gradient-based algorithm.
We prove that, provided a nontrivial correlation between the downstream and pretraining tasks, finetuning from a pretrained model allows us to bypass the initial sample-intensive stage.
arXiv Detail & Related papers (2024-10-30T20:29:10Z) - Latent Space Energy-based Neural ODEs [73.01344439786524]
This paper introduces a novel family of deep dynamical models designed to represent continuous-time sequence data.
We train the model using maximum likelihood estimation with Markov chain Monte Carlo.
Experiments on oscillating systems, videos and real-world state sequences (MuJoCo) illustrate that ODEs with the learnable energy-based prior outperform existing counterparts.
arXiv Detail & Related papers (2024-09-05T18:14:22Z) - Dynamical Regimes of Diffusion Models [14.797301819675454]
We study generative diffusion models in the regime where the dimension of space and the number of data are large.
Our analysis reveals three distinct dynamical regimes during the backward generative diffusion process.
The dependence of the collapse time on the dimension and number of data provides a thorough characterization of the curse of dimensionality for diffusion models.
arXiv Detail & Related papers (2024-02-28T17:19:26Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Explaining the Machine Learning Solution of the Ising Model [0.0]
This work shows how it can be accomplished for the ferromagnetic Ising model, the main target of several machine learning (ML) studies in statistical physics.
By using a neural network (NN) without hidden layers (the simplest possible) and informed by the symmetry of the Hamiltonian, an explanation is provided for the strategy used in finding the supervised learning solution.
These results pave the way to a physics-informed explainable generalized framework, enabling the extraction of physical laws and principles from the parameters of the models.
arXiv Detail & Related papers (2024-02-18T20:47:33Z) - From Stability to Chaos: Analyzing Gradient Descent Dynamics in
Quadratic Regression [14.521929085104441]
We investigate the dynamics of gradient descent using large-order constant step-sizes in the context of quadratic regression models.
We delineate five distinct training phases: (1) monotonic, (2) catapult, (3) periodic, (4) chaotic, and (5) divergent.
In particular, we observe that performing an ergodic trajectory averaging stabilizes the test error in non-monotonic (and non-divergent) phases.
arXiv Detail & Related papers (2023-10-02T22:59:17Z) - Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method.
A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations.
We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z) - Unsupervised machine learning of topological phase transitions from
experimental data [52.77024349608834]
We apply unsupervised machine learning techniques to experimental data from ultracold atoms.
We obtain the topological phase diagram of the Haldane model in a completely unbiased fashion.
Our work provides a benchmark for unsupervised detection of new exotic phases in complex many-body systems.
arXiv Detail & Related papers (2021-01-14T16:38:21Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.