On Centralized and Distributed Mirror Descent: Exponential Convergence
Analysis Using Quadratic Constraints
- URL: http://arxiv.org/abs/2105.14385v1
- Date: Sat, 29 May 2021 23:05:56 GMT
- Title: On Centralized and Distributed Mirror Descent: Exponential Convergence
Analysis Using Quadratic Constraints
- Authors: Youbang Sun, Mahyar Fazlyab, Shahin Shahrampour
- Abstract summary: Mirror descent (MD) is a powerful first-order optimization technique that subsumes several algorithms including gradient descent (GD)
We study the exact convergence rate of MD in both centralized and distributed cases for strongly convex and smooth problems.
- Score: 8.336315962271396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mirror descent (MD) is a powerful first-order optimization technique that
subsumes several optimization algorithms including gradient descent (GD). In
this work, we study the exact convergence rate of MD in both centralized and
distributed cases for strongly convex and smooth problems. We view MD with a
dynamical system lens and leverage quadratic constraints (QCs) to provide
convergence guarantees based on the Lyapunov stability. For centralized MD, we
establish a semi-definite programming (SDP) that certifies exponentially fast
convergence of MD subject to a linear matrix inequality (LMI). We prove that
the SDP always has a feasible solution that recovers the optimal GD rate. Next,
we analyze the exponential convergence of distributed MD and characterize the
rate using two LMIs. To the best of our knowledge, the exact (exponential) rate
of distributed MD has not been previously explored in the literature. We
present numerical results as a verification of our theory and observe that the
richness of the Lyapunov function entails better (worst-case) convergence rates
compared to existing works on distributed GD.
Related papers
- Stochastic Optimization with Optimal Importance Sampling [49.484190237840714]
We propose an iterative-based algorithm that jointly updates the decision and the IS distribution without requiring time-scale separation between the two.
Our method achieves the lowest possible variable variance and guarantees global convergence under convexity of the objective and mild assumptions on the IS distribution family.
arXiv Detail & Related papers (2025-04-04T16:10:18Z) - Nonparametric Bellman Mappings for Value Iteration in Distributed Reinforcement Learning [3.5051814539447474]
This paper introduces novel Bellman mappings (B-Maps) for value iteration (VI) in distributed reinforcement learning (DRL)
B-Maps operate on Q-functions represented in a kernel Hilbert space, enabling a nonparametric formulation.
Numerical experiments on two well-known control problems demonstrate the superior performance of the proposed nonparametric B-Maps.
arXiv Detail & Related papers (2025-03-20T14:39:21Z) - Preference-Based Multi-Agent Reinforcement Learning: Data Coverage and Algorithmic Techniques [65.55451717632317]
We study Preference-Based Multi-Agent Reinforcement Learning (PbMARL)
We identify the Nash equilibrium from a preference-only offline dataset in general-sum games.
Our findings underscore the multifaceted approach required for PbMARL.
arXiv Detail & Related papers (2024-09-01T13:14:41Z) - A Uniform Concentration Inequality for Kernel-Based Two-Sample Statistics [4.757470449749877]
We show that these metrics can be unified under a general framework of kernel-based two-sample statistics.
This paper establishes a novel uniform concentration inequality for the aforementioned kernel-based statistics.
As illustrative applications, we demonstrate how these bounds facilitate the component of error bounds for procedures such as distance covariance-based dimension reduction.
arXiv Detail & Related papers (2024-05-22T22:41:56Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Stability and Generalization of the Decentralized Stochastic Gradient
Descent Ascent Algorithm [80.94861441583275]
We investigate the complexity of the generalization bound of the decentralized gradient descent (D-SGDA) algorithm.
Our results analyze the impact of different top factors on the generalization of D-SGDA.
We also balance it with the generalization to obtain the optimal convex-concave setting.
arXiv Detail & Related papers (2023-10-31T11:27:01Z) - A Unified Momentum-based Paradigm of Decentralized SGD for Non-Convex
Models and Heterogeneous Data [0.261072980439312]
We propose a unified paradigm called U.MP, D-MP and GT-D, which provides a convergence guarantee for non general objectives.
In theory we provide the convergence analysis objectives two approaches for these non-MP algorithms.
arXiv Detail & Related papers (2023-03-01T02:13:22Z) - Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models.
We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling.
We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z) - DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over
Graphs [54.08445874064361]
We propose to solve a regularized distributionally robust learning problem in the decentralized setting.
By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust problem.
We show that our proposed algorithm can improve the worst distribution test accuracy by up to $10%$.
arXiv Detail & Related papers (2022-08-29T18:01:42Z) - Utilising the CLT Structure in Stochastic Gradient based Sampling :
Improved Analysis and Faster Algorithms [14.174806471635403]
We consider approximations of sampling algorithms, such as Gradient Langevin Dynamics (SGLD) and the Random Batch Method (RBM) for Interacting Particle Dynamcs (IPD)
We observe that the noise introduced by the approximation is nearly Gaussian due to the Central Limit Theorem (CLT) while the driving Brownian motion is exactly Gaussian.
We harness this structure to absorb the approximation error inside the diffusion process, and obtain improved convergence guarantees for these algorithms.
arXiv Detail & Related papers (2022-06-08T10:17:40Z) - False Correlation Reduction for Offline Reinforcement Learning [115.11954432080749]
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm.
We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL)
arXiv Detail & Related papers (2021-10-24T15:34:03Z) - Exact and Approximation Algorithms for Sparse PCA [1.7640556247739623]
This paper proposes two exact mixed-integer SDPs (MISDPs)
We then analyze the theoretical optimality gaps of their continuous relaxation values and prove that they are stronger than that of the state-of-art one.
Since off-the-shelf solvers, in general, have difficulty in solving MISDPs, we approximate SPCA with arbitrary accuracy by a mixed-integer linear program (MILP) of a similar size as MISDPs.
arXiv Detail & Related papers (2020-08-28T02:07:08Z) - A Decentralized Approach to Bayesian Learning [26.74338464389837]
Motivated by decentralized approaches to machine learning, we propose a collaborative learning taking the form of decentralized Langevin dynamics.
Our analysis show that the initial KL-divergence between the Markov Chain and the target posterior distribution is exponentially decreasing.
The performance of individual agents with locally available data is on par with the centralized setting with considerable improvement in the rate.
arXiv Detail & Related papers (2020-07-14T03:59:17Z) - Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave
Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.