Related papers: A Distributed Training Algorithm of Generative Adversarial Networks with Quantized Gradients

A Distributed Training Algorithm of Generative Adversarial Networks with Quantized Gradients

URL: http://arxiv.org/abs/2010.13359v1
Date: Mon, 26 Oct 2020 06:06:43 GMT
Title: A Distributed Training Algorithm of Generative Adversarial Networks with Quantized Gradients
Authors: Xiaojun Chen and Shu Yang and Li Shen and Xuanrong Pang
Abstract summary: We propose a distributed GANs training algorithm with quantized gradient, dubbed DQGAN, which is the first distributed training method with quantized gradient for GANs. The new method trains GANs based on a specific single machine algorithm called Optimistic Mirror Descent (OMD) algorithm, and is applicable to any gradient compression method that satisfies a general $delta$-approximate compressor. Theoretically, we establish the non-asymptotic convergence of DQGAN algorithm to first-order stationary point, which shows that the proposed algorithm can achieve a linear speedup in the
Score: 8.202072658184166
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training generative adversarial networks (GAN) in a distributed fashion is a promising technology since it is contributed to training GAN on a massive of data efficiently in real-world applications. However, GAN is known to be difficult to train by SGD-type methods (may fail to converge) and the distributed SGD-type methods may also suffer from massive amount of communication cost. In this paper, we propose a {distributed GANs training algorithm with quantized gradient, dubbed DQGAN,} which is the first distributed training method with quantized gradient for GANs. The new method trains GANs based on a specific single machine algorithm called Optimistic Mirror Descent (OMD) algorithm, and is applicable to any gradient compression method that satisfies a general $\delta$-approximate compressor. The error-feedback operation we designed is used to compensate for the bias caused by the compression, and moreover, ensure the convergence of the new method. Theoretically, we establish the non-asymptotic convergence of {DQGAN} algorithm to first-order stationary point, which shows that the proposed algorithm can achieve a linear speedup in the parameter server model. Empirically, our experiments show that our {DQGAN} algorithm can reduce the communication cost and save the training time with slight performance degradation on both synthetic and real datasets.

Related papers

Compressed Decentralized Momentum Stochastic Gradient Methods for Nonconvex Optimization [20.588374906635256]
We design two compressed decentralized algorithms for solving non-compression heterogeneity optimization under two different scenarios.<n>Both algorithms adopt a momentum technique to achieve fast convergence and a message-speed technique to save communication costs.<n> Superior empirical performance is observed over state-of-the-art methods on deep neural networks (DNNs) and Transformers.
arXiv Detail & Related papers (2025-08-07T00:33:00Z)
Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks [4.554284689395686]
implicit gradient descent (IGD) outperforms the common gradient descent (GD) algorithm in handling certain multi-scale problems.<n>We show that IGD converges to a globally optimal solution at a linear convergence rate.
arXiv Detail & Related papers (2024-07-03T06:10:41Z)
Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework. Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z)
Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z)
Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning. Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z)
LD-GAN: Low-Dimensional Generative Adversarial Network for Spectral Image Generation with Variance Regularization [72.4394510913927]
Deep learning methods are state-of-the-art for spectral image (SI) computational tasks. GANs enable diverse augmentation by learning and sampling from the data distribution. GAN-based SI generation is challenging since the high-dimensionality nature of this kind of data hinders the convergence of the GAN training yielding to suboptimal generation. We propose a statistical regularization to control the low-dimensional representation variance for the autoencoder training and to achieve high diversity of samples generated with the GAN.
arXiv Detail & Related papers (2023-04-29T00:25:02Z)
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching. Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z)
On Accelerating Distributed Convex Optimizations [0.0]
This paper studies a distributed multi-agent convex optimization problem. We show that the proposed algorithm converges linearly with an improved rate of convergence than the traditional and adaptive gradient-descent methods. We demonstrate our algorithm's superior performance compared to prominent distributed algorithms for solving real logistic regression problems.
arXiv Detail & Related papers (2021-08-19T13:19:54Z)
Practical Convex Formulation of Robust One-hidden-layer Neural Network Training [12.71266194474117]
We show that the training of a one-hidden-layer, scalar-output fully-connected ReLULU neural network can be reformulated as a finite-dimensional convex program. We derive a convex optimization approach to efficiently solve the "adversarial training" problem. Our method can be applied to binary classification and regression, and provides an alternative to the current adversarial training methods.
arXiv Detail & Related papers (2021-05-25T22:06:27Z)
An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC) Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
A Hybrid-Order Distributed SGD Method for Non-Convex Optimization to Balance Communication Overhead, Computational Complexity, and Convergence Rate [28.167294398293297]
We propose a method of distributed gradient descent (SGD) with low communication load and computational complexity, and still fast. To reduce the computational complexity in each iteration, the worker nodes approximate the directional derivatives with zeroth-order gradient estimation.
arXiv Detail & Related papers (2020-03-27T14:02:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.