Dual Training of Energy-Based Models with Overparametrized Shallow
Neural Networks
- URL: http://arxiv.org/abs/2107.05134v1
- Date: Sun, 11 Jul 2021 21:43:18 GMT
- Title: Dual Training of Energy-Based Models with Overparametrized Shallow
Neural Networks
- Authors: Carles Domingo-Enrich, Alberto Bietti, Marylou Gabri\'e, Joan Bruna,
Eric Vanden-Eijnden
- Abstract summary: Energy-based models (EBMs) are generative models that are usually trained via maximum likelihood estimation.
We propose a dual formulation of an EBMs algorithm in which the particles are sometimes restarted at random samples drawn from the data set, and show that performing these restarts corresponds to a score every step.
These results are illustrated in simple numerical experiments.
- Score: 41.702175127106784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Energy-based models (EBMs) are generative models that are usually trained via
maximum likelihood estimation. This approach becomes challenging in generic
situations where the trained energy is nonconvex, due to the need to sample the
Gibbs distribution associated with this energy. Using general Fenchel duality
results, we derive variational principles dual to maximum likelihood EBMs with
shallow overparametrized neural network energies, both in the active (aka
feature-learning) and lazy regimes. In the active regime, this dual formulation
leads to a training algorithm in which one updates concurrently the particles
in the sample space and the neurons in the parameter space of the energy. We
also consider a variant of this algorithm in which the particles are sometimes
restarted at random samples drawn from the data set, and show that performing
these restarts at every iteration step corresponds to score matching training.
Using intermediate parameter setups in our dual algorithm thereby gives a way
to interpolate between maximum likelihood and score matching training. These
results are illustrated in simple numerical experiments.
Related papers
- Neural Thermodynamic Integration: Free Energies from Energy-based Diffusion Models [19.871787625519513]
We propose to perform thermodynamic integration (TI) along an alchemical pathway represented by a trainable neural network.
In this work, we parametrize a time-dependent Hamiltonian interpolating between the interacting and non-interacting systems, and optimize its gradient.
The ability of the resulting energy-based diffusion model to sample all intermediate ensembles allows us to perform TI from a single reference calculation.
arXiv Detail & Related papers (2024-06-04T13:42:42Z) - Iterated Denoising Energy Matching for Sampling from Boltzmann Densities [109.23137009609519]
Iterated Denoising Energy Matching (iDEM)
iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our matching objective.
We show that the proposed approach achieves state-of-the-art performance on all metrics and trains $2-5times$ faster.
arXiv Detail & Related papers (2024-02-09T01:11:23Z) - Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood [64.95663299945171]
Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming.
There exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models.
We propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs.
arXiv Detail & Related papers (2023-09-10T22:05:24Z) - Efficient Training of Energy-Based Models Using Jarzynski Equality [13.636994997309307]
Energy-based models (EBMs) are generative models inspired by statistical physics.
The computation of its gradient with respect to the model parameters requires sampling the model distribution.
Here we show how results for nonequilibrium thermodynamics based on Jarzynski equality can be used to perform this computation efficiently.
arXiv Detail & Related papers (2023-05-30T21:07:52Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Effective Dynamics of Generative Adversarial Networks [16.51305515824504]
Generative adversarial networks (GANs) are a class of machine-learning models that use adversarial training to generate new samples.
One major form of training failure, known as mode collapse, involves the generator failing to reproduce the full diversity of modes in the target probability distribution.
We present an effective model of GAN training, which captures the learning dynamics by replacing the generator neural network with a collection of particles in the output space.
arXiv Detail & Related papers (2022-12-08T22:04:01Z) - A DeepParticle method for learning and generating aggregation patterns
in multi-dimensional Keller-Segel chemotaxis systems [3.6184545598911724]
We study a regularized interacting particle method for computing aggregation patterns and near singular solutions of a Keller-Segal (KS) chemotaxis system in two and three space dimensions.
We further develop DeepParticle (DP) method to learn and generate solutions under variations of physical parameters.
arXiv Detail & Related papers (2022-08-31T20:52:01Z) - Particle Dynamics for Learning EBMs [83.59335980576637]
Energy-based modeling is a promising approach to unsupervised learning, which yields many downstream applications from a single model.
The main difficulty in learning energy-based models with the "contrastive approaches" is the generation of samples from the current energy function at each iteration.
This paper proposes an alternative approach to getting these samples and avoiding crude MCMC sampling from the current model.
arXiv Detail & Related papers (2021-11-26T23:41:07Z) - Controllable and Compositional Generation with Latent-Space Energy-Based
Models [60.87740144816278]
Controllable generation is one of the key requirements for successful adoption of deep generative models in real-world applications.
In this work, we use energy-based models (EBMs) to handle compositional generation over a set of attributes.
By composing energy functions with logical operators, this work is the first to achieve such compositionality in generating photo-realistic images of resolution 1024x1024.
arXiv Detail & Related papers (2021-10-21T03:31:45Z) - The Gaussian equivalence of generative models for learning with shallow
neural networks [30.47878306277163]
We study the performance of neural networks trained on data drawn from pre-trained generative models.
We provide three strands of rigorous, analytical and numerical evidence corroborating this equivalence.
These results open a viable path to the theoretical study of machine learning models with realistic data.
arXiv Detail & Related papers (2020-06-25T21:20:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.