Don't Be So Dense: Sparse-to-Sparse GAN Training Without Sacrificing
Performance
- URL: http://arxiv.org/abs/2203.02770v1
- Date: Sat, 5 Mar 2022 15:18:03 GMT
- Title: Don't Be So Dense: Sparse-to-Sparse GAN Training Without Sacrificing
Performance
- Authors: Shiwei Liu, Yuesong Tian, Tianlong Chen, Li Shen
- Abstract summary: Generative adversarial networks (GANs) have received an upsurging interest since being proposed due to the high quality of the generated data.
For inference, the existing model compression techniques can reduce the model complexity with comparable performance.
In this paper, we explore the possibility of directly training sparse GAN from scratch without involving any dense or pre-training steps.
- Score: 47.94567935516651
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Generative adversarial networks (GANs) have received an upsurging interest
since being proposed due to the high quality of the generated data. While
achieving increasingly impressive results, the resource demands associated with
the large model size hinders the usage of GANs in resource-limited scenarios.
For inference, the existing model compression techniques can reduce the model
complexity with comparable performance. However, the training efficiency of
GANs has less been explored due to the fragile training process of GANs. In
this paper, we, for the first time, explore the possibility of directly
training sparse GAN from scratch without involving any dense or pre-training
steps. Even more unconventionally, our proposed method enables directly
training sparse unbalanced GANs with an extremely sparse generator from
scratch. Instead of training full GANs, we start with sparse GANs and
dynamically explore the parameter space spanned over the generator throughout
training. Such a sparse-to-sparse training procedure enhances the capacity of
the highly sparse generator progressively while sticking to a fixed small
parameter budget with appealing training and inference efficiency gains.
Extensive experiments with modern GAN architectures validate the effectiveness
of our method. Our sparsified GANs, trained from scratch in one single run, are
able to outperform the ones learned by expensive iterative pruning and
re-training. Perhaps most importantly, we find instead of inheriting parameters
from expensive pre-trained GANs, directly training sparse GANs from scratch can
be a much more efficient solution. For example, only training with a 80% sparse
generator and a 70% sparse discriminator, our method can achieve even better
performance than the dense BigGAN.
Related papers
- ESRL: Efficient Sampling-based Reinforcement Learning for Sequence
Generation [43.506732624371786]
We introduce two-stage sampling and dynamic sampling approaches to improve the sampling efficiency during training sequence generation models via RL.
Experimental results show that the efficient sampling-based RL, referred to as ESRL, can outperform all baselines in terms of both training efficiency and memory consumption.
arXiv Detail & Related papers (2023-08-04T09:35:45Z) - Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z) - Balanced Training for Sparse GANs [16.045866864231417]
We propose a novel metric called the balance ratio (BR) to study the balance between the sparse generator and discriminator.
We also introduce a new method called balanced dynamic sparse training (ADAPT), which seeks to control the BR during GAN training to achieve a good trade-off between performance and computational cost.
arXiv Detail & Related papers (2023-02-28T15:34:01Z) - Dynamic Sparse Training via Balancing the Exploration-Exploitation
Trade-off [19.230329532065635]
Sparse training could significantly mitigate the training costs by reducing the model size.
Existing sparse training methods mainly use either random-based or greedy-based drop-and-grow strategies.
In this work, we consider the dynamic sparse training as a sparse connectivity search problem.
Experimental results show that sparse models (up to 98% sparsity) obtained by our proposed method outperform the SOTA sparse training methods.
arXiv Detail & Related papers (2022-11-30T01:22:25Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural
Networks [78.62086125399831]
We present a general approach called Alternating Compressed/DeCompressed (AC/DC) training of deep neural networks (DNNs)
AC/DC outperforms existing sparse training methods in accuracy at similar computational budgets.
An important property of AC/DC is that it allows co-training of dense and sparse models, yielding accurate sparse-dense model pairs at the end of the training process.
arXiv Detail & Related papers (2021-06-23T13:23:00Z) - Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models.
Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z) - TaylorGAN: Neighbor-Augmented Policy Update for Sample-Efficient Natural
Language Generation [79.4205462326301]
TaylorGAN is a novel approach to score function-based natural language generation.
It augments the gradient estimation by off-policy update and the first-order Taylor expansion.
It enables us to train NLG models from scratch with smaller batch size.
arXiv Detail & Related papers (2020-11-27T02:26:15Z) - A Distributed Training Algorithm of Generative Adversarial Networks with
Quantized Gradients [8.202072658184166]
We propose a distributed GANs training algorithm with quantized gradient, dubbed DQGAN, which is the first distributed training method with quantized gradient for GANs.
The new method trains GANs based on a specific single machine algorithm called Optimistic Mirror Descent (OMD) algorithm, and is applicable to any gradient compression method that satisfies a general $delta$-approximate compressor.
Theoretically, we establish the non-asymptotic convergence of DQGAN algorithm to first-order stationary point, which shows that the proposed algorithm can achieve a linear speedup in the
arXiv Detail & Related papers (2020-10-26T06:06:43Z) - FusedProp: Towards Efficient Training of Generative Adversarial Networks [0.0]
We propose the fused propagation algorithm which can be used to efficiently train the discriminator and the generator of common GANs simultaneously.
We show that FusedProp achieves 1.49 times the training speed compared to the conventional training of GANs.
arXiv Detail & Related papers (2020-03-30T06:46:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.