DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over
Graphs
- URL: http://arxiv.org/abs/2208.13810v1
- Date: Mon, 29 Aug 2022 18:01:42 GMT
- Title: DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over
Graphs
- Authors: Chaouki Ben Issaid, Anis Elgabli and Mehdi Bennis
- Abstract summary: We propose to solve a regularized distributionally robust learning problem in the decentralized setting.
By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust problem.
We show that our proposed algorithm can improve the worst distribution test accuracy by up to $10%$.
- Score: 54.08445874064361
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose to solve a regularized distributionally robust
learning problem in the decentralized setting, taking into account the data
distribution shift. By adding a Kullback-Liebler regularization function to the
robust min-max optimization problem, the learning problem can be reduced to a
modified robust minimization problem and solved efficiently. Leveraging the
newly formulated optimization problem, we propose a robust version of
Decentralized Stochastic Gradient Descent (DSGD), coined Distributionally
Robust Decentralized Stochastic Gradient Descent (DR-DSGD). Under some mild
assumptions and provided that the regularization parameter is larger than one,
we theoretically prove that DR-DSGD achieves a convergence rate of
$\mathcal{O}\left(1/\sqrt{KT} + K/T\right)$, where $K$ is the number of devices
and $T$ is the number of iterations. Simulation results show that our proposed
algorithm can improve the worst distribution test accuracy by up to $10\%$.
Moreover, DR-DSGD is more communication-efficient than DSGD since it requires
fewer communication rounds (up to $20$ times less) to achieve the same worst
distribution test accuracy target. Furthermore, the conducted experiments
reveal that DR-DSGD results in a fairer performance across devices in terms of
test accuracy.
Related papers
- Non-stationary Online Convex Optimization with Arbitrary Delays [50.46856739179311]
This paper investigates the delayed online convex optimization (OCO) in non-stationary environments.
We first propose a simple algorithm, namely DOGD, which performs a gradient descent step for each delayed gradient according to their arrival order.
We develop an improved algorithm, which reduces those dynamic regret bounds achieved by DOGD to $O(sqrtbardT(P_T+1))$.
arXiv Detail & Related papers (2023-05-20T07:54:07Z) - Scaling up Stochastic Gradient Descent for Non-convex Optimisation [5.908471365011942]
We propose a novel approach to the problem of shared parallel computation.
By combining two strategies into a unified framework, DPSGD is a better trade computation framework.
The potential gains can be achieved by DPSGD on a deep learning (DRL) problem (Latent Diletrichal inference) and on a deep learning (DRL) problem (advantage actor - A2C)
arXiv Detail & Related papers (2022-10-06T13:06:08Z) - Adaptive Stochastic Gradient Descent for Fast and
Communication-Efficient Distributed Learning [33.590006101071765]
We consider the setting where a master wants to run a distributed descent (SGD) algorithm on $n$ workers.
We show that the adaptive version of distributed SGD can reach lower error values in less time compared to non-adaptive implementations.
arXiv Detail & Related papers (2022-08-04T10:57:25Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Decentralized Stochastic Proximal Gradient Descent with Variance
Reduction over Time-varying Networks [30.231314171218994]
In decentralized learning, a network of nodes cooperate to minimize an overall objective function that is usually the finite-sum of their local objectives.
We propose a novel algorithm, namely DPSVRG, to accelerate the decentralized training by leveraging the variance reduction technique.
arXiv Detail & Related papers (2021-12-20T08:23:36Z) - Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis [16.499651513178012]
Distributionally robust optimization (DRO) is a widely-used approach to learn models that are robust against distribution shift.
We provide non-asymptotic convergence guarantees even though the objective function is possibly prominent nonsmooth- and has normalized gradient descent.
arXiv Detail & Related papers (2021-10-24T14:56:38Z) - Removing Data Heterogeneity Influence Enhances Network Topology
Dependence of Decentralized SGD [15.112499553818953]
We study the non-asymotic convergence property of the D$2$/Exact-diffusion algorithm.
Compared with existing decentralized algorithms, D$2$/Exact-diffusion is least sensitive to network topology.
arXiv Detail & Related papers (2021-05-17T17:16:52Z) - Fast Distributionally Robust Learning with Variance Reduced Min-Max
Optimization [85.84019017587477]
Distributionally robust supervised learning is emerging as a key paradigm for building reliable machine learning systems for real-world applications.
Existing algorithms for solving Wasserstein DRSL involve solving complex subproblems or fail to make use of gradients.
We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable extra-gradient algorithms.
arXiv Detail & Related papers (2021-04-27T16:56:09Z) - Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning.
We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems.
We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z) - Distributionally Robust Bayesian Optimization [121.71766171427433]
We present a novel distributionally robust Bayesian optimization algorithm (DRBO) for zeroth-order, noisy optimization.
Our algorithm provably obtains sub-linear robust regret in various settings.
We demonstrate the robust performance of our method on both synthetic and real-world benchmarks.
arXiv Detail & Related papers (2020-02-20T22:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.