Understanding Overparameterization in Generative Adversarial Networks
- URL: http://arxiv.org/abs/2104.05605v1
- Date: Mon, 12 Apr 2021 16:23:37 GMT
- Title: Understanding Overparameterization in Generative Adversarial Networks
- Authors: Yogesh Balaji, Mohammadmahdi Sajedi, Neha Mukund Kalibhat, Mucong
Ding, Dominik St\"oger, Mahdi Soltanolkotabi, Soheil Feizi
- Abstract summary: Generative Adversarial Networks (GANs) are used to train non- concave mini-max optimization problems.
A theory has shown the importance of the gradient descent (GD) to globally optimal solutions.
We show that in an overized GAN with a $1$-layer neural network generator and a linear discriminator, the GDA converges to a global saddle point of the underlying non- concave min-max problem.
- Score: 56.57403335510056
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A broad class of unsupervised deep learning methods such as Generative
Adversarial Networks (GANs) involve training of overparameterized models where
the number of parameters of the model exceeds a certain threshold. A large body
of work in supervised learning have shown the importance of model
overparameterization in the convergence of the gradient descent (GD) to
globally optimal solutions. In contrast, the unsupervised setting and GANs in
particular involve non-convex concave mini-max optimization problems that are
often trained using Gradient Descent/Ascent (GDA). The role and benefits of
model overparameterization in the convergence of GDA to a global saddle point
in non-convex concave problems is far less understood. In this work, we present
a comprehensive analysis of the importance of model overparameterization in
GANs both theoretically and empirically. We theoretically show that in an
overparameterized GAN model with a $1$-layer neural network generator and a
linear discriminator, GDA converges to a global saddle point of the underlying
non-convex concave min-max problem. To the best of our knowledge, this is the
first result for global convergence of GDA in such settings. Our theory is
based on a more general result that holds for a broader class of nonlinear
generators and discriminators that obey certain assumptions (including deeper
generators and random feature discriminators). We also empirically study the
role of model overparameterization in GANs using several large-scale
experiments on CIFAR-10 and Celeb-A datasets. Our experiments show that
overparameterization improves the quality of generated samples across various
model architectures and datasets. Remarkably, we observe that
overparameterization leads to faster and more stable convergence behavior of
GDA across the board.
Related papers
- On the Convergence of (Stochastic) Gradient Descent for Kolmogorov--Arnold Networks [56.78271181959529]
Kolmogorov--Arnold Networks (KANs) have gained significant attention in the deep learning community.
Empirical investigations demonstrate that KANs optimized via gradient descent (SGD) are capable of achieving near-zero training loss.
arXiv Detail & Related papers (2024-10-10T15:34:10Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Toward the Identifiability of Comparative Deep Generative Models [7.5479347719819865]
We propose a theory of identifiability for comparative Deep Generative Models (DGMs)
We show that, while these models lack identifiability across a general class of mixing functions, they surprisingly become identifiable when the mixing function is piece-wise affine.
We also investigate the impact of model misspecification, and empirically show that previously proposed regularization techniques for fitting comparative DGMs help with identifiability when the number of latent variables is not known in advance.
arXiv Detail & Related papers (2024-01-29T06:10:54Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - A Unified Momentum-based Paradigm of Decentralized SGD for Non-Convex
Models and Heterogeneous Data [0.261072980439312]
We propose a unified paradigm called U.MP, D-MP and GT-D, which provides a convergence guarantee for non general objectives.
In theory we provide the convergence analysis objectives two approaches for these non-MP algorithms.
arXiv Detail & Related papers (2023-03-01T02:13:22Z) - Deep Generative Modeling on Limited Data with Regularization by
Nontransferable Pre-trained Models [32.52492468276371]
We propose regularized deep generative model (Reg-DGM) to reduce the variance of generative modeling with limited data.
Reg-DGM uses a pre-trained model to optimize a weighted sum of a certain divergence and the expectation of an energy function.
Empirically, with various pre-trained feature extractors and a data-dependent energy function, Reg-DGM consistently improves the generation performance of strong DGMs with limited data.
arXiv Detail & Related papers (2022-08-30T10:28:50Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - Deep Autoencoding Topic Model with Scalable Hybrid Bayesian Inference [55.35176938713946]
We develop deep autoencoding topic model (DATM) that uses a hierarchy of gamma distributions to construct its multi-stochastic-layer generative network.
We propose a Weibull upward-downward variational encoder that deterministically propagates information upward via a deep neural network, followed by a downward generative model.
The efficacy and scalability of our models are demonstrated on both unsupervised and supervised learning tasks on big corpora.
arXiv Detail & Related papers (2020-06-15T22:22:56Z) - Deep Latent-Variable Kernel Learning [25.356503463916816]
We present a complete deep latent-variable kernel learning (DLVKL) model wherein the latent variables perform encoding for regularized representation.
Experiments imply that the DLVKL-NSDE performs similarly to the well calibrated GP on small datasets, and outperforms existing deep GPs on large datasets.
arXiv Detail & Related papers (2020-05-18T05:55:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.