Related papers: DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over Graphs

DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over Graphs

URL: http://arxiv.org/abs/2208.13810v1
Date: Mon, 29 Aug 2022 18:01:42 GMT
Title: DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over Graphs
Authors: Chaouki Ben Issaid, Anis Elgabli and Mehdi Bennis
Abstract summary: We propose to solve a regularized distributionally robust learning problem in the decentralized setting. By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust problem. We show that our proposed algorithm can improve the worst distribution test accuracy by up to $10%$.
Score: 54.08445874064361
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we propose to solve a regularized distributionally robust learning problem in the decentralized setting, taking into account the data distribution shift. By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust minimization problem and solved efficiently. Leveraging the newly formulated optimization problem, we propose a robust version of Decentralized Stochastic Gradient Descent (DSGD), coined Distributionally Robust Decentralized Stochastic Gradient Descent (DR-DSGD). Under some mild assumptions and provided that the regularization parameter is larger than one, we theoretically prove that DR-DSGD achieves a convergence rate of $\mathcal{O}\left(1/\sqrt{KT} + K/T\right)$, where $K$ is the number of devices and $T$ is the number of iterations. Simulation results show that our proposed algorithm can improve the worst distribution test accuracy by up to $10\%$. Moreover, DR-DSGD is more communication-efficient than DSGD since it requires fewer communication rounds (up to $20$ times less) to achieve the same worst distribution test accuracy target. Furthermore, the conducted experiments reveal that DR-DSGD results in a fairer performance across devices in terms of test accuracy.

Related papers

Non-stationary Online Convex Optimization with Arbitrary Delays [50.46856739179311]
This paper investigates the delayed online convex optimization (OCO) in non-stationary environments. We first propose a simple algorithm, namely DOGD, which performs a gradient descent step for each delayed gradient according to their arrival order. We develop an improved algorithm, which reduces those dynamic regret bounds achieved by DOGD to $O(sqrtbardT(P_T+1))$.
arXiv Detail & Related papers (2023-05-20T07:54:07Z)
Scaling up Stochastic Gradient Descent for Non-convex Optimisation [5.908471365011942]
We propose a novel approach to the problem of shared parallel computation. By combining two strategies into a unified framework, DPSGD is a better trade computation framework. The potential gains can be achieved by DPSGD on a deep learning (DRL) problem (Latent Diletrichal inference) and on a deep learning (DRL) problem (advantage actor - A2C)
arXiv Detail & Related papers (2022-10-06T13:06:08Z)
Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning [33.590006101071765]
We consider the setting where a master wants to run a distributed descent (SGD) algorithm on $n$ workers. We show that the adaptive version of distributed SGD can reach lower error values in less time compared to non-adaptive implementations.
arXiv Detail & Related papers (2022-08-04T10:57:25Z)
Normalized/Clipped SGD with Perturbation for Differentially Private Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data. We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z)
Decentralized Stochastic Proximal Gradient Descent with Variance Reduction over Time-varying Networks [30.231314171218994]
In decentralized learning, a network of nodes cooperate to minimize an overall objective function that is usually the finite-sum of their local objectives. We propose a novel algorithm, namely DPSVRG, to accelerate the decentralized training by leveraging the variance reduction technique.
arXiv Detail & Related papers (2021-12-20T08:23:36Z)
Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis [16.499651513178012]
Distributionally robust optimization (DRO) is a widely-used approach to learn models that are robust against distribution shift. We provide non-asymptotic convergence guarantees even though the objective function is possibly prominent nonsmooth- and has normalized gradient descent.
arXiv Detail & Related papers (2021-10-24T14:56:38Z)
Removing Data Heterogeneity Influence Enhances Network Topology Dependence of Decentralized SGD [15.112499553818953]
We study the non-asymotic convergence property of the D$2$/Exact-diffusion algorithm. Compared with existing decentralized algorithms, D$2$/Exact-diffusion is least sensitive to network topology.
arXiv Detail & Related papers (2021-05-17T17:16:52Z)
Fast Distributionally Robust Learning with Variance Reduced Min-Max Optimization [85.84019017587477]
Distributionally robust supervised learning is emerging as a key paradigm for building reliable machine learning systems for real-world applications. Existing algorithms for solving Wasserstein DRSL involve solving complex subproblems or fail to make use of gradients. We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable extra-gradient algorithms.
arXiv Detail & Related papers (2021-04-27T16:56:09Z)
Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning. We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems. We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z)
Distributionally Robust Bayesian Optimization [121.71766171427433]
We present a novel distributionally robust Bayesian optimization algorithm (DRBO) for zeroth-order, noisy optimization. Our algorithm provably obtains sub-linear robust regret in various settings. We demonstrate the robust performance of our method on both synthetic and real-world benchmarks.
arXiv Detail & Related papers (2020-02-20T22:04:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.