Related papers: Don't Be So Dense: Sparse-to-Sparse GAN Training Without Sacrificing Performance

Don't Be So Dense: Sparse-to-Sparse GAN Training Without Sacrificing Performance

URL: http://arxiv.org/abs/2203.02770v1
Date: Sat, 5 Mar 2022 15:18:03 GMT
Title: Don't Be So Dense: Sparse-to-Sparse GAN Training Without Sacrificing Performance
Authors: Shiwei Liu, Yuesong Tian, Tianlong Chen, Li Shen
Abstract summary: Generative adversarial networks (GANs) have received an upsurging interest since being proposed due to the high quality of the generated data. For inference, the existing model compression techniques can reduce the model complexity with comparable performance. In this paper, we explore the possibility of directly training sparse GAN from scratch without involving any dense or pre-training steps.
Score: 47.94567935516651
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Generative adversarial networks (GANs) have received an upsurging interest since being proposed due to the high quality of the generated data. While achieving increasingly impressive results, the resource demands associated with the large model size hinders the usage of GANs in resource-limited scenarios. For inference, the existing model compression techniques can reduce the model complexity with comparable performance. However, the training efficiency of GANs has less been explored due to the fragile training process of GANs. In this paper, we, for the first time, explore the possibility of directly training sparse GAN from scratch without involving any dense or pre-training steps. Even more unconventionally, our proposed method enables directly training sparse unbalanced GANs with an extremely sparse generator from scratch. Instead of training full GANs, we start with sparse GANs and dynamically explore the parameter space spanned over the generator throughout training. Such a sparse-to-sparse training procedure enhances the capacity of the highly sparse generator progressively while sticking to a fixed small parameter budget with appealing training and inference efficiency gains. Extensive experiments with modern GAN architectures validate the effectiveness of our method. Our sparsified GANs, trained from scratch in one single run, are able to outperform the ones learned by expensive iterative pruning and re-training. Perhaps most importantly, we find instead of inheriting parameters from expensive pre-trained GANs, directly training sparse GANs from scratch can be a much more efficient solution. For example, only training with a 80% sparse generator and a 70% sparse discriminator, our method can achieve even better performance than the dense BigGAN.

Related papers

Adapt then Unlearn: Exploring Parameter Space Semantics for Unlearning in Generative Adversarial Networks [5.107720313575234]
This work aims to prevent the generation of outputs containing undesired features from a pre-trained Generative Adversarial Network (GAN) Our proposed two-stage method, known as 'Adapt-then-Unlearn,' excels at unlearning such undesirable features while also maintaining the quality of generated samples. To the best of our knowledge, our approach stands as the first method addressing unlearning within the realm of high-fidelity GANs.
arXiv Detail & Related papers (2023-09-25T11:36:20Z)
ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation [43.506732624371786]
We introduce two-stage sampling and dynamic sampling approaches to improve the sampling efficiency during training sequence generation models via RL. Experimental results show that the efficient sampling-based RL, referred to as ESRL, can outperform all baselines in terms of both training efficiency and memory consumption.
arXiv Detail & Related papers (2023-08-04T09:35:45Z)
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks. We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z)
Balanced Training for Sparse GANs [16.045866864231417]
We propose a novel metric called the balance ratio (BR) to study the balance between the sparse generator and discriminator. We also introduce a new method called balanced dynamic sparse training (ADAPT), which seeks to control the BR during GAN training to achieve a good trade-off between performance and computational cost.
arXiv Detail & Related papers (2023-02-28T15:34:01Z)
Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off [19.230329532065635]
Sparse training could significantly mitigate the training costs by reducing the model size. Existing sparse training methods mainly use either random-based or greedy-based drop-and-grow strategies. In this work, we consider the dynamic sparse training as a sparse connectivity search problem. Experimental results show that sparse models (up to 98% sparsity) obtained by our proposed method outperform the SOTA sparse training methods.
arXiv Detail & Related papers (2022-11-30T01:22:25Z)
Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios. We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)
AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural Networks [78.62086125399831]
We present a general approach called Alternating Compressed/DeCompressed (AC/DC) training of deep neural networks (DNNs) AC/DC outperforms existing sparse training methods in accuracy at similar computational budgets. An important property of AC/DC is that it allows co-training of dense and sparse models, yielding accurate sparse-dense model pairs at the end of the training process.
arXiv Detail & Related papers (2021-06-23T13:23:00Z)
Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models. Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z)
TaylorGAN: Neighbor-Augmented Policy Update for Sample-Efficient Natural Language Generation [79.4205462326301]
TaylorGAN is a novel approach to score function-based natural language generation. It augments the gradient estimation by off-policy update and the first-order Taylor expansion. It enables us to train NLG models from scratch with smaller batch size.
arXiv Detail & Related papers (2020-11-27T02:26:15Z)
A Distributed Training Algorithm of Generative Adversarial Networks with Quantized Gradients [8.202072658184166]
We propose a distributed GANs training algorithm with quantized gradient, dubbed DQGAN, which is the first distributed training method with quantized gradient for GANs. The new method trains GANs based on a specific single machine algorithm called Optimistic Mirror Descent (OMD) algorithm, and is applicable to any gradient compression method that satisfies a general $delta$-approximate compressor. Theoretically, we establish the non-asymptotic convergence of DQGAN algorithm to first-order stationary point, which shows that the proposed algorithm can achieve a linear speedup in the
arXiv Detail & Related papers (2020-10-26T06:06:43Z)
FusedProp: Towards Efficient Training of Generative Adversarial Networks [0.0]
We propose the fused propagation algorithm which can be used to efficiently train the discriminator and the generator of common GANs simultaneously. We show that FusedProp achieves 1.49 times the training speed compared to the conventional training of GANs.
arXiv Detail & Related papers (2020-03-30T06:46:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.