Related papers: Propagation of Chaos for Mean-Field Langevin Dynamics and its Application to Model Ensemble

Propagation of Chaos for Mean-Field Langevin Dynamics and its Application to Model Ensemble

URL: http://arxiv.org/abs/2502.05784v1
Date: Sun, 09 Feb 2025 05:58:46 GMT
Title: Propagation of Chaos for Mean-Field Langevin Dynamics and its Application to Model Ensemble
Authors: Atsushi Nitanda, Anzelle Lee, Damian Tan Xing Kai, Mizuki Sakaguchi, Taiji Suzuki,
Abstract summary: Mean-field Langevin dynamics (MFLD) is an optimization method derived by taking the mean-field limit of noisy gradient descent for two-layer neural networks.<n>Recent work shows that the approximation error due to finite particles remains uniform in time and diminishes as the number of particles increases.<n>In this paper, we establish an improved PoC result for MFLD, which removes the exponential dependence on the regularization coefficient from the particle approximation term.
Score: 36.19164064733151
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mean-field Langevin dynamics (MFLD) is an optimization method derived by taking the mean-field limit of noisy gradient descent for two-layer neural networks in the mean-field regime. Recently, the propagation of chaos (PoC) for MFLD has gained attention as it provides a quantitative characterization of the optimization complexity in terms of the number of particles and iterations. A remarkable progress by Chen et al. (2022) showed that the approximation error due to finite particles remains uniform in time and diminishes as the number of particles increases. In this paper, by refining the defective log-Sobolev inequality -- a key result from that earlier work -- under the neural network training setting, we establish an improved PoC result for MFLD, which removes the exponential dependence on the regularization coefficient from the particle approximation term of the optimization complexity. As an application, we propose a PoC-based model ensemble strategy with theoretical guarantees.

Related papers

Pseudospectral method for solving PDEs using Matrix Product States [0.0]
This research focuses on solving time-dependent partial differential equations (PDEs) using matrix product states (MPS) We propose an extension of Hermite Distributed Approximating Functionals (HDAF) to MPS, a highly accurate pseudospectral method for approximating functions of derivatives.
arXiv Detail & Related papers (2024-09-04T17:53:38Z)
Improved Particle Approximation Error for Mean Field Neural Networks [9.817855108627452]
Mean-field Langevin dynamics (MFLD) minimizes an entropy-regularized nonlinear convex functional defined over the space of probability distributions. Recent works have demonstrated the uniform-in-time propagation of chaos for MFLD. We improve the dependence on logarithmic Sobolev inequality (LSI) constants in their particle approximation errors.
arXiv Detail & Related papers (2024-05-24T17:59:06Z)
RoPINN: Region Optimized Physics-Informed Neural Networks [66.38369833561039]
Physics-informed neural networks (PINNs) have been widely applied to solve partial differential equations (PDEs) This paper proposes and theoretically studies a new training paradigm as region optimization. A practical training algorithm, Region Optimized PINN (RoPINN), is seamlessly derived from this new paradigm.
arXiv Detail & Related papers (2024-05-23T09:45:57Z)
Taming the Interacting Particle Langevin Algorithm -- the superlinear case [0.0]
We develop a new class of stable, under such non-linearities, algorithms called tamed interacting particle Langevin algorithms (tIPLA) We obtain non-asymptotic convergence error estimates in Wasserstein-2 distance for the new class under an optimal rate.
arXiv Detail & Related papers (2024-03-28T17:11:25Z)
Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems [78.96969465641024]
We extend mean-field Langevin dynamics to minimax optimization over probability distributions for the first time with symmetric and provably convergent updates. We also study time and particle discretization regimes and prove a new uniform-in-time propagation of chaos result.
arXiv Detail & Related papers (2023-12-02T13:01:29Z)
Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise. In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z)
Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift. Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures. We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z)
An Optimization-based Deep Equilibrium Model for Hyperspectral Image Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem. A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network. The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z)
Convex Analysis of the Mean Field Langevin Dynamics [49.66486092259375]
convergence rate analysis of the mean field Langevin dynamics is presented. $p_q$ associated with the dynamics allows us to develop a convergence theory parallel to classical results in convex optimization.
arXiv Detail & Related papers (2022-01-25T17:13:56Z)
Hessian-Free High-Resolution Nesterov Acceleration for Sampling [55.498092486970364]
Nesterov's Accelerated Gradient (NAG) for optimization has better performance than its continuous time limit (noiseless kinetic Langevin) when a finite step-size is employed. This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods.
arXiv Detail & Related papers (2020-06-16T15:07:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.