Function approximation by neural nets in the mean-field regime: Entropic regularization and controlled McKean-Vlasov dynamics
- URL: http://arxiv.org/abs/2002.01987v4
- Date: Sat, 22 Jun 2024 16:57:27 GMT
- Title: Function approximation by neural nets in the mean-field regime: Entropic regularization and controlled McKean-Vlasov dynamics
- Authors: Belinda Tzen, Maxim Raginsky,
- Abstract summary: We consider the problem of function approximation by two-layer neural nets with random weights that are "nearly Gaussian"
We show that the problem can be phrased as global minimization of a free energy functional on the space of paths over probability measures on the weights.
- Score: 7.1822457112352955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of function approximation by two-layer neural nets with random weights that are "nearly Gaussian" in the sense of Kullback-Leibler divergence. Our setting is the mean-field limit, where the finite population of neurons in the hidden layer is replaced by a continuous ensemble. We show that the problem can be phrased as global minimization of a free energy functional on the space of (finite-length) paths over probability measures on the weights. This functional trades off the $L^2$ approximation risk of the terminal measure against the KL divergence of the path with respect to an isotropic Brownian motion prior. We characterize the unique global minimizer and examine the dynamics in the space of probability measures over weights that can achieve it. In particular, we show that the optimal path-space measure corresponds to the F\"ollmer drift, the solution to a McKean-Vlasov optimal control problem closely related to the classic Schr\"odinger bridge problem. While the F\"ollmer drift cannot in general be obtained in closed form, thus limiting its potential algorithmic utility, we illustrate the viability of the mean-field Langevin diffusion as a finite-time approximation under various conditions on entropic regularization. Specifically, we show that it closely tracks the F\"ollmer drift when the regularization is such that the minimizing density is log-concave.
Related papers
- Mean-field underdamped Langevin dynamics and its spacetime
discretization [5.832709207282124]
We propose a new method called the N-particle underdamped Langevin algorithm for optimizing a special class of non-linear functionals defined over the space of probability measures.
Our algorithm is based on a novel spacetime discretization of the mean-field underdamped Langevin dynamics.
arXiv Detail & Related papers (2023-12-26T23:59:04Z) - Symmetric Mean-field Langevin Dynamics for Distributional Minimax
Problems [78.96969465641024]
We extend mean-field Langevin dynamics to minimax optimization over probability distributions for the first time with symmetric and provably convergent updates.
We also study time and particle discretization regimes and prove a new uniform-in-time propagation of chaos result.
arXiv Detail & Related papers (2023-12-02T13:01:29Z) - Projected Langevin dynamics and a gradient flow for entropic optimal
transport [0.8057006406834466]
We introduce analogous diffusion dynamics that sample from an entropy-regularized optimal transport.
By studying the induced Wasserstein geometry of the submanifold $Pi(mu,nu)$, we argue that the SDE can be viewed as a Wasserstein gradient flow on this space of couplings.
arXiv Detail & Related papers (2023-09-15T17:55:56Z) - Wasserstein Quantum Monte Carlo: A Novel Approach for Solving the
Quantum Many-Body Schr\"odinger Equation [56.9919517199927]
"Wasserstein Quantum Monte Carlo" (WQMC) uses the gradient flow induced by the Wasserstein metric, rather than Fisher-Rao metric, and corresponds to transporting the probability mass, rather than teleporting it.
We demonstrate empirically that the dynamics of WQMC results in faster convergence to the ground state of molecular systems.
arXiv Detail & Related papers (2023-07-06T17:54:08Z) - Convergence of mean-field Langevin dynamics: Time and space
discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift.
Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures.
We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z) - Accelerating Convergence in Global Non-Convex Optimization with
Reversible Diffusion [0.0]
Langevin Dynamics has been extensively in global non- optimization experiments.
Our proposed method is used to investigate the trade-off between speed and discretization error.
arXiv Detail & Related papers (2023-05-19T07:49:40Z) - Trajectory Inference via Mean-field Langevin in Path Space [0.17205106391379024]
Trajectory inference aims at recovering the dynamics of a population from snapshots of its temporal marginals.
A min-entropy estimator relative to the Wiener measure in path space was introduced by Lavenant et al.
arXiv Detail & Related papers (2022-05-14T23:13:00Z) - Convex Analysis of the Mean Field Langevin Dynamics [49.66486092259375]
convergence rate analysis of the mean field Langevin dynamics is presented.
$p_q$ associated with the dynamics allows us to develop a convergence theory parallel to classical results in convex optimization.
arXiv Detail & Related papers (2022-01-25T17:13:56Z) - Lifting the Convex Conjugate in Lagrangian Relaxations: A Tractable
Approach for Continuous Markov Random Fields [53.31927549039624]
We show that a piecewise discretization preserves better contrast from existing discretization problems.
We apply this theory to the problem of matching two images.
arXiv Detail & Related papers (2021-07-13T12:31:06Z) - A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs)
We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation.
Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.