Mean-Field Langevin Dynamics for Signed Measures via a Bilevel Approach
- URL: http://arxiv.org/abs/2406.17054v2
- Date: Wed, 26 Jun 2024 16:12:27 GMT
- Title: Mean-Field Langevin Dynamics for Signed Measures via a Bilevel Approach
- Authors: Guillaume Wang, Alireza Mousavi-Hosseini, Lénaïc Chizat,
- Abstract summary: Mean-field Langevin dynamics (MLFD) is a class of interacting particle methods that tackle convex optimization over probability measures on a manifold.
We show how to extend the MFLD framework to convex optimization problems over signed measures.
- Score: 4.577104493960515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mean-field Langevin dynamics (MLFD) is a class of interacting particle methods that tackle convex optimization over probability measures on a manifold, which are scalable, versatile, and enjoy computational guarantees. However, some important problems -- such as risk minimization for infinite width two-layer neural networks, or sparse deconvolution -- are originally defined over the set of signed, rather than probability, measures. In this paper, we investigate how to extend the MFLD framework to convex optimization problems over signed measures. Among two known reductions from signed to probability measures -- the lifting and the bilevel approaches -- we show that the bilevel reduction leads to stronger guarantees and faster rates (at the price of a higher per-iteration complexity). In particular, we investigate the convergence rate of MFLD applied to the bilevel reduction in the low-noise regime and obtain two results. First, this dynamics is amenable to an annealing schedule, adapted from Suzuki et al. (2023), that results in improved convergence rates to a fixed multiplicative accuracy. Second, we investigate the problem of learning a single neuron with the bilevel approach and obtain local exponential convergence rates that depend polynomially on the dimension and noise level (to compare with the exponential dependence that would result from prior analyses).
Related papers
- Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning [5.325297567945828]
We propose a new method for two-time-scale optimization that achieves significantly faster convergence than the prior arts.
We characterize the proposed algorithm under various conditions and show how it specializes on online sample-based methods.
arXiv Detail & Related papers (2024-05-15T19:03:08Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - On the Communication Complexity of Decentralized Bilevel Optimization [40.45379954138305]
We propose two novel decentralized bilevel gradient descent algorithms based on simultaneous and alternating update strategies.
Our algorithms can achieve faster convergence rates and lower communication costs than existing methods.
This is the first time such favorable theoretical results have been achieved with mild assumptions in the heterogeneous setting.
arXiv Detail & Related papers (2023-11-19T14:56:26Z) - Communication-Efficient Federated Bilevel Optimization with Local and
Global Lower Level Problems [118.00379425831566]
We propose a communication-efficient algorithm, named FedBiOAcc.
We prove that FedBiOAcc-Local converges at the same rate for this type of problems.
Empirical results show superior performance of our algorithms.
arXiv Detail & Related papers (2023-02-13T21:28:53Z) - Averaged Method of Multipliers for Bi-Level Optimization without
Lower-Level Strong Convexity [43.092883358935545]
We propose a single loop Bi-level Averaged Method of Multipliers (sl-BAMM) for Bi-Level Optimization (BLO)
We provide non-asymptotic convergence analysis of sl-BAMM towards KKT stationary points, and the comparative advantage of our analysis lies in the absence of strong gradient boundedness assumption.
arXiv Detail & Related papers (2023-02-07T11:29:05Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave
Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.