GraN-GAN: Piecewise Gradient Normalization for Generative Adversarial
Networks
- URL: http://arxiv.org/abs/2111.03162v1
- Date: Thu, 4 Nov 2021 21:13:02 GMT
- Title: GraN-GAN: Piecewise Gradient Normalization for Generative Adversarial
Networks
- Authors: Vineeth S. Bhaskara, Tristan Aumentado-Armstrong, Allan Jepson, Alex
Levinshtein
- Abstract summary: Agenerative adversarial networks (GANs) predominantly use piecewise linear activation functions in discriminators (or critics)
We present Gradient Normalization (GraN), a novel input-dependent normalization method, which guarantees a piecewise K-Lipschitz constraint in the input space.
GraN does not constrain processing at the individual network layers, and, unlike gradient penalties, strictly enforces a piecewise Lipschitz constraint almost everywhere.
- Score: 2.3666095711348363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern generative adversarial networks (GANs) predominantly use piecewise
linear activation functions in discriminators (or critics), including ReLU and
LeakyReLU. Such models learn piecewise linear mappings, where each piece
handles a subset of the input space, and the gradients per subset are piecewise
constant. Under such a class of discriminator (or critic) functions, we present
Gradient Normalization (GraN), a novel input-dependent normalization method,
which guarantees a piecewise K-Lipschitz constraint in the input space. In
contrast to spectral normalization, GraN does not constrain processing at the
individual network layers, and, unlike gradient penalties, strictly enforces a
piecewise Lipschitz constraint almost everywhere. Empirically, we demonstrate
improved image generation performance across multiple datasets (incl.
CIFAR-10/100, STL-10, LSUN bedrooms, and CelebA), GAN loss functions, and
metrics. Further, we analyze altering the often untuned Lipschitz constant K in
several standard GANs, not only attaining significant performance gains, but
also finding connections between K and training dynamics, particularly in
low-gradient loss plateaus, with the common Adam optimizer.
Related papers
- CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization [36.20084231028338]
Generative Adversarial Networks (GANs) significantly advanced image generation but their performance heavily depends on abundant training data.
In scenarios with limited data, GANs often struggle with discriminator overfitting and unstable training.
We present CHAIN, which replaces the conventional centering step with zero-mean regularization and integrates a Lipschitz continuity constraint in the scaling step.
arXiv Detail & Related papers (2024-03-31T01:41:36Z) - The Implicit Bias of Batch Normalization in Linear Models and Two-layer
Linear Convolutional Neural Networks [117.93273337740442]
We show that gradient descent converges to a uniform margin classifier on the training data with an $exp(-Omega(log2 t))$ convergence rate.
We also show that batch normalization has an implicit bias towards a patch-wise uniform margin.
arXiv Detail & Related papers (2023-06-20T16:58:00Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Fast Convergence in Learning Two-Layer Neural Networks with Separable
Data [37.908159361149835]
We study normalized gradient descent on two-layer neural nets.
We prove for exponentially-tailed losses that using normalized GD leads to linear rate of convergence of the training loss to the global optimum.
arXiv Detail & Related papers (2023-05-22T20:30:10Z) - A Unified Algebraic Perspective on Lipschitz Neural Networks [88.14073994459586]
This paper introduces a novel perspective unifying various types of 1-Lipschitz neural networks.
We show that many existing techniques can be derived and generalized via finding analytical solutions of a common semidefinite programming (SDP) condition.
Our approach, called SDP-based Lipschitz Layers (SLL), allows us to design non-trivial yet efficient generalization of convex potential layers.
arXiv Detail & Related papers (2023-03-06T14:31:09Z) - Sharper analysis of sparsely activated wide neural networks with
trainable biases [103.85569570164404]
This work studies training one-hidden-layer overparameterized ReLU networks via gradient descent in the neural tangent kernel (NTK) regime.
Surprisingly, it is shown that the network after sparsification can achieve as fast convergence as the original network.
Since the generalization bound has dependence on the smallest eigenvalue of the limiting NTK, this work further studies the least eigenvalue of the limiting NTK.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Sparsest Univariate Learning Models Under Lipschitz Constraint [31.28451181040038]
We propose continuous-domain formulations for one-dimensional regression problems.
We control the Lipschitz constant explicitly using a user-defined upper-bound.
We show that both problems admit global minimizers that are continuous and piecewise-linear.
arXiv Detail & Related papers (2021-12-27T07:03:43Z) - Learning Quantized Neural Nets by Coarse Gradient Method for Non-linear
Classification [3.158346511479111]
We propose a class of STEs with certain monotonicity, and consider their applications to the training of a two-linear-layer network with quantized activation functions.
We establish performance guarantees for the proposed STEs by showing that the corresponding coarse gradient methods converge to the global minimum.
arXiv Detail & Related papers (2020-11-23T07:50:09Z) - Implicit Bias in Deep Linear Classification: Initialization Scale vs
Training Accuracy [71.25689267025244]
We show how the transition is controlled by the relationship between the scale and how accurately we minimize the training loss.
Our results indicate that some limit behaviors of gradient descent only kick in at ridiculous training accuracies.
arXiv Detail & Related papers (2020-07-13T23:49:53Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.