Related papers: Convex Analysis of the Mean Field Langevin Dynamics

Convex Analysis of the Mean Field Langevin Dynamics

URL: http://arxiv.org/abs/2201.10469v1
Date: Tue, 25 Jan 2022 17:13:56 GMT
Title: Convex Analysis of the Mean Field Langevin Dynamics
Authors: Atsushi Nitanda, Denny Wu, Taiji Suzuki
Abstract summary: convergence rate analysis of the mean field Langevin dynamics is presented. $p_q$ associated with the dynamics allows us to develop a convergence theory parallel to classical results in convex optimization.
Score: 49.66486092259375
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As an example of the nonlinear Fokker-Planck equation, the mean field Langevin dynamics attracts attention due to its connection to (noisy) gradient descent on infinitely wide neural networks in the mean field regime, and hence the convergence property of the dynamics is of great theoretical interest. In this work, we give a simple and self-contained convergence rate analysis of the mean field Langevin dynamics with respect to the (regularized) objective function in both continuous and discrete time settings. The key ingredient of our proof is a proximal Gibbs distribution $p_q$ associated with the dynamics, which, in combination of techniques in [Vempala and Wibisono (2019)], allows us to develop a convergence theory parallel to classical results in convex optimization. Furthermore, we reveal that $p_q$ connects to the duality gap in the empirical risk minimization setting, which enables efficient empirical evaluation of the algorithm convergence.

Related papers

Learning Overspecified Gaussian Mixtures Exponentially Fast with the EM Algorithm [5.625796693054093]
We investigate the convergence properties of the EM algorithm when applied to overspecified Gaussian mixture models.<n>We demonstrate that the population EM algorithm converges exponentially fast in terms of the Kullback-Leibler (KL) distance.
arXiv Detail & Related papers (2025-06-13T14:57:57Z)
Similarity Matching Networks: Hebbian Learning and Convergence Over Multiple Time Scales [5.093257685701887]
We consider and analyze the emphsimilarity matching network for principal subspace projection.<n>By leveraging a multilevel optimization framework, we prove convergence of the dynamics in the offline setting.
arXiv Detail & Related papers (2025-06-06T14:46:22Z)
A Bias-Correction Decentralized Stochastic Gradient Algorithm with Momentum Acceleration [19.83835152405735]
We propose a momentum-celerated distributed gradient, termed Exact-Diffusion with Momentum (EDM) EDM mitigates the bias from data heterogeneity and incorporates momentum techniques commonly used in deep learning. Our theoretical analysis demonstrates that the EDM algorithm converges sublinearly to the neighborhood optimal solution.
arXiv Detail & Related papers (2025-01-31T12:15:58Z)
Sampling with Adaptive Variance for Multimodal Distributions [14.121491356732188]
We propose and analyze a class of distributions sampling algorithms for a bounded domain. We show that a derivative-free version can be used for sampling without information on the Gibbs potential.
arXiv Detail & Related papers (2024-11-20T22:05:47Z)
Quantum space-time Poincaré inequality for Lindblad dynamics [15.031583573428481]
We derive explicit and constructive exponential decay estimates for the convergence in the noncommutative $L2$-norm. Our analysis relies on establishing a quantum analog of space-time Poincar'e inequalities. A number of concrete examples are provided as applications of our theoretical results.
arXiv Detail & Related papers (2024-06-13T13:43:41Z)
Taming the Interacting Particle Langevin Algorithm -- the superlinear case [0.0]
We develop a new class of stable, under such non-linearities, algorithms called tamed interacting particle Langevin algorithms (tIPLA) We obtain non-asymptotic convergence error estimates in Wasserstein-2 distance for the new class under an optimal rate.
arXiv Detail & Related papers (2024-03-28T17:11:25Z)
Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z)
Revealing Decurve Flows for Generalized Graph Propagation [108.80758541147418]
This study addresses the limitations of the traditional analysis of message-passing, central to graph learning, by defining em textbfgeneralized propagation with directed and weighted graphs. We include a preliminary exploration of learned propagation patterns in datasets, a first in the field.
arXiv Detail & Related papers (2024-02-13T14:13:17Z)
Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems [78.96969465641024]
We extend mean-field Langevin dynamics to minimax optimization over probability distributions for the first time with symmetric and provably convergent updates. We also study time and particle discretization regimes and prove a new uniform-in-time propagation of chaos result.
arXiv Detail & Related papers (2023-12-02T13:01:29Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift. Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures. We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z)
Can Decentralized Stochastic Minimax Optimization Algorithms Converge Linearly for Finite-Sum Nonconvex-Nonconcave Problems? [56.62372517641597]
Decentralized minimax optimization has been actively studied in the past few years due to its application in a wide range machine learning. This paper develops two novel decentralized minimax optimization algorithms for the non-strongly-nonconcave problem.
arXiv Detail & Related papers (2023-04-24T02:19:39Z)
Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems [42.375903320536715]
The entropic fictitious play (EFP) is a recently proposed algorithm that minimizes the sum of a convex functional and entropy in the space of measures. We provide a concise primal-dual analysis of EFP in the setting where the learning problem exhibits a finite-sum structure.
arXiv Detail & Related papers (2023-03-06T08:05:08Z)
Structured Optimal Variational Inference for Dynamic Latent Space Models [16.531262817315696]
We consider a latent space model for dynamic networks, where our objective is to estimate the pairwise inner products plus the intercept of the latent positions. To balance posterior inference and computational scalability, we consider a structured mean-field variational inference framework.
arXiv Detail & Related papers (2022-09-29T22:10:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.