Related papers: Dual Training of Energy-Based Models with Overparametrized Shallow Neural Networks

Dual Training of Energy-Based Models with Overparametrized Shallow Neural Networks

URL: http://arxiv.org/abs/2107.05134v1
Date: Sun, 11 Jul 2021 21:43:18 GMT
Title: Dual Training of Energy-Based Models with Overparametrized Shallow Neural Networks
Authors: Carles Domingo-Enrich, Alberto Bietti, Marylou Gabri\'e, Joan Bruna, Eric Vanden-Eijnden
Abstract summary: Energy-based models (EBMs) are generative models that are usually trained via maximum likelihood estimation. We propose a dual formulation of an EBMs algorithm in which the particles are sometimes restarted at random samples drawn from the data set, and show that performing these restarts corresponds to a score every step. These results are illustrated in simple numerical experiments.
Score: 41.702175127106784
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Energy-based models (EBMs) are generative models that are usually trained via maximum likelihood estimation. This approach becomes challenging in generic situations where the trained energy is nonconvex, due to the need to sample the Gibbs distribution associated with this energy. Using general Fenchel duality results, we derive variational principles dual to maximum likelihood EBMs with shallow overparametrized neural network energies, both in the active (aka feature-learning) and lazy regimes. In the active regime, this dual formulation leads to a training algorithm in which one updates concurrently the particles in the sample space and the neurons in the parameter space of the energy. We also consider a variant of this algorithm in which the particles are sometimes restarted at random samples drawn from the data set, and show that performing these restarts at every iteration step corresponds to score matching training. Using intermediate parameter setups in our dual algorithm thereby gives a way to interpolate between maximum likelihood and score matching training. These results are illustrated in simple numerical experiments.

Related papers

From expNN to sinNN: automatic generation of sum-of-products models for potential energy surfaces in internal coordinates using neural networks and sparse grid sampling [0.0]
This work aims to evaluate the practicality of a single-layer artificial neural network with sinusoidal activation functions for representing potential energy surfaces in sum-of-products form. The fitting approach, named sinNN, is applied to modeling the PES of HONO, covering both the trans and cis isomers. The sinNN PES model was able to reproduce available experimental fundamental vibrational transition energies with a root mean square error of about 17 cm-1.
arXiv Detail & Related papers (2025-04-30T07:31:32Z)
Adjoint Sampling: Highly Scalable Diffusion Samplers via Adjoint Matching [33.9461078261722]
We introduce Adjoint Sampling, a highly scalable and efficient algorithm for learning diffusion processes that sample from unnormalized densities. We show how to incorporate key symmetries, as well as periodic boundary conditions, for modeling molecules in both cartesian and torsional coordinates. We demonstrate the effectiveness of our approach through extensive experiments on classical energy functions, and further scale up to neural network-based energy models.
arXiv Detail & Related papers (2025-04-16T02:20:06Z)
Bayesian Circular Regression with von Mises Quasi-Processes [57.88921637944379]
In this work we explore a family of expressive and interpretable distributions over circle-valued random functions. For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Gibbs sampling. We present experiments applying this model to the prediction of wind directions and the percentage of the running gait cycle as a function of joint angles.
arXiv Detail & Related papers (2024-06-19T01:57:21Z)
Neural Thermodynamic Integration: Free Energies from Energy-based Diffusion Models [19.871787625519513]
We propose to perform thermodynamic integration (TI) along an alchemical pathway represented by a trainable neural network. In this work, we parametrize a time-dependent Hamiltonian interpolating between the interacting and non-interacting systems, and optimize its gradient. The ability of the resulting energy-based diffusion model to sample all intermediate ensembles allows us to perform TI from a single reference calculation.
arXiv Detail & Related papers (2024-06-04T13:42:42Z)
Iterated Denoising Energy Matching for Sampling from Boltzmann Densities [109.23137009609519]
Iterated Denoising Energy Matching (iDEM) iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our matching objective. We show that the proposed approach achieves state-of-the-art performance on all metrics and trains $2-5times$ faster.
arXiv Detail & Related papers (2024-02-09T01:11:23Z)
Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood [64.95663299945171]
Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming. There exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models. We propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs.
arXiv Detail & Related papers (2023-09-10T22:05:24Z)
Efficient Training of Energy-Based Models Using Jarzynski Equality [13.636994997309307]
Energy-based models (EBMs) are generative models inspired by statistical physics. The computation of its gradient with respect to the model parameters requires sampling the model distribution. Here we show how results for nonequilibrium thermodynamics based on Jarzynski equality can be used to perform this computation efficiently.
arXiv Detail & Related papers (2023-05-30T21:07:52Z)
Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data. In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z)
Effective Dynamics of Generative Adversarial Networks [16.51305515824504]
Generative adversarial networks (GANs) are a class of machine-learning models that use adversarial training to generate new samples. One major form of training failure, known as mode collapse, involves the generator failing to reproduce the full diversity of modes in the target probability distribution. We present an effective model of GAN training, which captures the learning dynamics by replacing the generator neural network with a collection of particles in the output space.
arXiv Detail & Related papers (2022-12-08T22:04:01Z)
A DeepParticle method for learning and generating aggregation patterns in multi-dimensional Keller-Segel chemotaxis systems [3.6184545598911724]
We study a regularized interacting particle method for computing aggregation patterns and near singular solutions of a Keller-Segal (KS) chemotaxis system in two and three space dimensions. We further develop DeepParticle (DP) method to learn and generate solutions under variations of physical parameters.
arXiv Detail & Related papers (2022-08-31T20:52:01Z)
Particle Dynamics for Learning EBMs [83.59335980576637]
Energy-based modeling is a promising approach to unsupervised learning, which yields many downstream applications from a single model. The main difficulty in learning energy-based models with the "contrastive approaches" is the generation of samples from the current energy function at each iteration. This paper proposes an alternative approach to getting these samples and avoiding crude MCMC sampling from the current model.
arXiv Detail & Related papers (2021-11-26T23:41:07Z)
Controllable and Compositional Generation with Latent-Space Energy-Based Models [60.87740144816278]
Controllable generation is one of the key requirements for successful adoption of deep generative models in real-world applications. In this work, we use energy-based models (EBMs) to handle compositional generation over a set of attributes. By composing energy functions with logical operators, this work is the first to achieve such compositionality in generating photo-realistic images of resolution 1024x1024.
arXiv Detail & Related papers (2021-10-21T03:31:45Z)
The Gaussian equivalence of generative models for learning with shallow neural networks [30.47878306277163]
We study the performance of neural networks trained on data drawn from pre-trained generative models. We provide three strands of rigorous, analytical and numerical evidence corroborating this equivalence. These results open a viable path to the theoretical study of machine learning models with realistic data.
arXiv Detail & Related papers (2020-06-25T21:20:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.