Rate Analysis of Coupled Distributed Stochastic Approximation for Misspecified Optimization
- URL: http://arxiv.org/abs/2404.13669v1
- Date: Sun, 21 Apr 2024 14:18:49 GMT
- Title: Rate Analysis of Coupled Distributed Stochastic Approximation for Misspecified Optimization
- Authors: Yaqun Yang, Jinlong Lei,
- Abstract summary: We consider an $n$ agents distributed optimization problem with imperfect information characterized in a parametric sense.
We propose a coupled distributed approximation algorithm, in which every agent updates the current beliefs of its unknown parameter.
We quantitatively characterize the factors that affect the algorithm performance, and prove that the mean-squared error of the decision variable is bounded by $mathcalO(frac1nk)+mathcalOleft(frac1sqrtn (1-rho_w)right)frac1k1.5
- Score: 0.552480439325792
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider an $n$ agents distributed optimization problem with imperfect information characterized in a parametric sense, where the unknown parameter can be solved by a distinct distributed parameter learning problem. Though each agent only has access to its local parameter learning and computational problem, they mean to collaboratively minimize the average of their local cost functions. To address the special optimization problem, we propose a coupled distributed stochastic approximation algorithm, in which every agent updates the current beliefs of its unknown parameter and decision variable by stochastic approximation method; and then averages the beliefs and decision variables of its neighbors over network in consensus protocol. Our interest lies in the convergence analysis of this algorithm. We quantitatively characterize the factors that affect the algorithm performance, and prove that the mean-squared error of the decision variable is bounded by $\mathcal{O}(\frac{1}{nk})+\mathcal{O}\left(\frac{1}{\sqrt{n}(1-\rho_w)}\right)\frac{1}{k^{1.5}}+\mathcal{O}\big(\frac{1}{(1-\rho_w)^2} \big)\frac{1}{k^2}$, where $k$ is the iteration count and $(1-\rho_w)$ is the spectral gap of the network weighted adjacency matrix. It reveals that the network connectivity characterized by $(1-\rho_w)$ only influences the high order of convergence rate, while the domain rate still acts the same as the centralized algorithm. In addition, we analyze that the transient iteration needed for reaching its dominant rate $\mathcal{O}(\frac{1}{nk})$ is $\mathcal{O}(\frac{n}{(1-\rho_w)^2})$. Numerical experiments are carried out to demonstrate the theoretical results by taking different CPUs as agents, which is more applicable to real-world distributed scenarios.
Related papers
- Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems.
We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z) - Convergence of Unadjusted Langevin in High Dimensions: Delocalization of Bias [13.642712817536072]
We show that as the dimension $d$ of the problem increases, the number of iterations required to ensure convergence within a desired error increases.
A key technical challenge we address is the lack of a one-step contraction property in the $W_2,ellinfty$ metric to measure convergence.
arXiv Detail & Related papers (2024-08-20T01:24:54Z) - Compressed and distributed least-squares regression: convergence rates
with applications to Federated Learning [9.31522898261934]
We investigate the impact of compression on gradient algorithms for machine learning.
We highlight differences in terms of convergence rates between several unbiased compression operators.
We extend our results to the case of federated learning.
arXiv Detail & Related papers (2023-08-02T18:02:00Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems.
Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z) - An Improved Analysis of Gradient Tracking for Decentralized Machine
Learning [34.144764431505486]
We consider decentralized machine learning over a network where the training data is distributed across $n$ agents.
The agent's common goal is to find a model that minimizes the average of all local loss functions.
We improve the dependency on $p$ from $mathcalO(p-1)$ to $mathcalO(p-1)$ in the noiseless case.
arXiv Detail & Related papers (2022-02-08T12:58:14Z) - A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning [3.655021726150368]
In Multi-Agent Reinforcement Learning (MARL), multiple agents interact with a common environment, as also with each other, for solving a shared problem in sequential decision-making.
We derive a novel law of iterated for a family of distributed nonlinear approximation schemes that is useful in MARL.
arXiv Detail & Related papers (2021-10-27T08:01:17Z) - Acceleration in Distributed Optimization Under Similarity [72.54787082152278]
We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes.
An $varepsilon$-solution is achieved in $tildemathcalrhoObig(sqrtfracbeta/mu (1-)log1/varepsilonbig)$ number of communications steps.
This rate matches (up to poly-log factors) for the first time lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest.
arXiv Detail & Related papers (2021-10-24T04:03:00Z) - Convergence Analysis of Nonconvex Distributed Stochastic Zeroth-order
Coordinate Method [3.860616339202303]
This paper investigates the distributed non optimization problem of minimizing a global cost function formed by the summation of $ZOn$ local cost functions.
Agents approximate their own ZO coordinate method to solve the problem.
arXiv Detail & Related papers (2021-03-24T03:07:46Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z) - Non-asymptotic Convergence of Adam-type Reinforcement Learning
Algorithms under Markovian Sampling [56.394284787780364]
This paper provides the first theoretical convergence analysis for two fundamental RL algorithms of policy gradient (PG) and temporal difference (TD) learning.
Under general nonlinear function approximation, PG-AMSGrad with a constant stepsize converges to a neighborhood of a stationary point at the rate of $mathcalO(log T/sqrtT)$.
Under linear function approximation, TD-AMSGrad with a constant stepsize converges to a neighborhood of the global optimum at the rate of $mathcalO(log T/sqrtT
arXiv Detail & Related papers (2020-02-15T00:26:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.