A Distributed Training Algorithm of Generative Adversarial Networks with
Quantized Gradients
- URL: http://arxiv.org/abs/2010.13359v1
- Date: Mon, 26 Oct 2020 06:06:43 GMT
- Title: A Distributed Training Algorithm of Generative Adversarial Networks with
Quantized Gradients
- Authors: Xiaojun Chen and Shu Yang and Li Shen and Xuanrong Pang
- Abstract summary: We propose a distributed GANs training algorithm with quantized gradient, dubbed DQGAN, which is the first distributed training method with quantized gradient for GANs.
The new method trains GANs based on a specific single machine algorithm called Optimistic Mirror Descent (OMD) algorithm, and is applicable to any gradient compression method that satisfies a general $delta$-approximate compressor.
Theoretically, we establish the non-asymptotic convergence of DQGAN algorithm to first-order stationary point, which shows that the proposed algorithm can achieve a linear speedup in the
- Score: 8.202072658184166
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training generative adversarial networks (GAN) in a distributed fashion is a
promising technology since it is contributed to training GAN on a massive of
data efficiently in real-world applications. However, GAN is known to be
difficult to train by SGD-type methods (may fail to converge) and the
distributed SGD-type methods may also suffer from massive amount of
communication cost. In this paper, we propose a {distributed GANs training
algorithm with quantized gradient, dubbed DQGAN,} which is the first
distributed training method with quantized gradient for GANs. The new method
trains GANs based on a specific single machine algorithm called Optimistic
Mirror Descent (OMD) algorithm, and is applicable to any gradient compression
method that satisfies a general $\delta$-approximate compressor. The
error-feedback operation we designed is used to compensate for the bias caused
by the compression, and moreover, ensure the convergence of the new method.
Theoretically, we establish the non-asymptotic convergence of {DQGAN} algorithm
to first-order stationary point, which shows that the proposed algorithm can
achieve a linear speedup in the parameter server model. Empirically, our
experiments show that our {DQGAN} algorithm can reduce the communication cost
and save the training time with slight performance degradation on both
synthetic and real datasets.
Related papers
- Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework.
Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - On Accelerating Distributed Convex Optimizations [0.0]
This paper studies a distributed multi-agent convex optimization problem.
We show that the proposed algorithm converges linearly with an improved rate of convergence than the traditional and adaptive gradient-descent methods.
We demonstrate our algorithm's superior performance compared to prominent distributed algorithms for solving real logistic regression problems.
arXiv Detail & Related papers (2021-08-19T13:19:54Z) - Practical Convex Formulation of Robust One-hidden-layer Neural Network
Training [12.71266194474117]
We show that the training of a one-hidden-layer, scalar-output fully-connected ReLULU neural network can be reformulated as a finite-dimensional convex program.
We derive a convex optimization approach to efficiently solve the "adversarial training" problem.
Our method can be applied to binary classification and regression, and provides an alternative to the current adversarial training methods.
arXiv Detail & Related papers (2021-05-25T22:06:27Z) - An Efficient Statistical-based Gradient Compression Technique for
Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC)
Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - A Hybrid-Order Distributed SGD Method for Non-Convex Optimization to
Balance Communication Overhead, Computational Complexity, and Convergence
Rate [28.167294398293297]
We propose a method of distributed gradient descent (SGD) with low communication load and computational complexity, and still fast.
To reduce the computational complexity in each iteration, the worker nodes approximate the directional derivatives with zeroth-order gradient estimation.
arXiv Detail & Related papers (2020-03-27T14:02:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.