A(DP)$^2$SGD: Asynchronous Decentralized Parallel Stochastic Gradient
Descent with Differential Privacy
- URL: http://arxiv.org/abs/2008.09246v1
- Date: Fri, 21 Aug 2020 00:56:22 GMT
- Title: A(DP)$^2$SGD: Asynchronous Decentralized Parallel Stochastic Gradient
Descent with Differential Privacy
- Authors: Jie Xu, Wei Zhang, Fei Wang
- Abstract summary: A popular distributed learning strategy is federated learning, where there is a central server storing the global model and a set of local computing nodes updating the model parameters with their corresponding data.
In this paper, we present a differentially private version of asynchronous decentralized parallel SGD framework, or A(DP)$2$SGD for short, which maintains communication efficiency of ADPSGD and prevents the inference from malicious participants.
- Score: 15.038697541988746
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As deep learning models are usually massive and complex, distributed learning
is essential for increasing training efficiency. Moreover, in many real-world
application scenarios like healthcare, distributed learning can also keep the
data local and protect privacy. A popular distributed learning strategy is
federated learning, where there is a central server storing the global model
and a set of local computing nodes updating the model parameters with their
corresponding data. The updated model parameters will be processed and
transmitted to the central server, which leads to heavy communication costs.
Recently, asynchronous decentralized distributed learning has been proposed and
demonstrated to be a more efficient and practical strategy where there is no
central server, so that each computing node only communicates with its
neighbors. Although no raw data will be transmitted across different local
nodes, there is still a risk of information leak during the communication
process for malicious participants to make attacks. In this paper, we present a
differentially private version of asynchronous decentralized parallel SGD
(ADPSGD) framework, or A(DP)$^2$SGD for short, which maintains communication
efficiency of ADPSGD and prevents the inference from malicious participants.
Specifically, R{\'e}nyi differential privacy is used to provide tighter privacy
analysis for our composite Gaussian mechanisms while the convergence rate is
consistent with the non-private version. Theoretical analysis shows
A(DP)$^2$SGD also converges at the optimal $\mathcal{O}(1/\sqrt{T})$ rate as
SGD. Empirically, A(DP)$^2$SGD achieves comparable model accuracy as the
differentially private version of Synchronous SGD (SSGD) but runs much faster
than SSGD in heterogeneous computing environments.
Related papers
- Towards Communication-efficient Federated Learning via Sparse and Aligned Adaptive Optimization [65.85963235502322]
Federated Adam (FedAdam) algorithms suffer from a threefold increase in uplink communication overhead.
We propose a novel sparse FedAdam algorithm called FedAdam-SSM, wherein distributed devices sparsify the updates local model parameters and moment estimates.
By minimizing the divergence bound between the model trained by FedAdam-SSM and centralized Adam, we optimize the SSM to mitigate the learning performance degradation caused by sparsification error.
arXiv Detail & Related papers (2024-05-28T07:56:49Z) - Secure Aggregation Meets Sparsification in Decentralized Learning [1.7010199949406575]
This paper introduces CESAR, a novel secure aggregation protocol for Decentralized Learning (DL)
CESAR provably defends against honest-but-curious adversaries and can be formally adapted to counteract collusion between them.
arXiv Detail & Related papers (2024-05-13T12:52:58Z) - Accelerating Parallel Stochastic Gradient Descent via Non-blocking
Mini-batches [3.736244431175932]
Non-blocking SGD can address the straggler problem in a heterogeneous environment.
Non-blocking SGD takes up to 2x fewer time to reach the same training loss in a heterogeneous environment.
arXiv Detail & Related papers (2022-11-02T05:25:01Z) - DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over
Graphs [54.08445874064361]
We propose to solve a regularized distributionally robust learning problem in the decentralized setting.
By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust problem.
We show that our proposed algorithm can improve the worst distribution test accuracy by up to $10%$.
arXiv Detail & Related papers (2022-08-29T18:01:42Z) - Homogeneous Learning: Self-Attention Decentralized Deep Learning [0.6091702876917281]
We propose a decentralized learning model called Homogeneous Learning (HL) for tackling non-IID data with a self-attention mechanism.
HL can produce a better performance compared with standalone learning and greatly reduce both the total training rounds by 50.8% and the communication cost by 74.6%.
arXiv Detail & Related papers (2021-10-11T14:05:29Z) - RelaySum for Decentralized Deep Learning on Heterogeneous Data [71.36228931225362]
In decentralized machine learning, workers compute model updates on their local data.
Because the workers only communicate with few neighbors without central coordination, these updates propagate progressively over the network.
This paradigm enables distributed training on networks without all-to-all connectivity, helping to protect data privacy as well as to reduce the communication cost of distributed training in data centers.
arXiv Detail & Related papers (2021-10-08T14:55:32Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z) - A Unified Theory of Decentralized SGD with Changing Topology and Local
Updates [70.9701218475002]
We introduce a unified convergence analysis of decentralized communication methods.
We derive universal convergence rates for several applications.
Our proofs rely on weak assumptions.
arXiv Detail & Related papers (2020-03-23T17:49:15Z) - Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays
in Distributed SGD [32.03967072200476]
We propose an algorithmic approach named OverlapLocal-Local-Local-SGD (Local momentum variant)
We achieve this by adding an anchor model on each node.
After multiple local updates, locally trained models will be pulled back towards the anchor model rather than communicating with others.
arXiv Detail & Related papers (2020-02-21T20:33:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.