GraN-GAN: Piecewise Gradient Normalization for Generative Adversarial
Networks
- URL: http://arxiv.org/abs/2111.03162v1
- Date: Thu, 4 Nov 2021 21:13:02 GMT
- Title: GraN-GAN: Piecewise Gradient Normalization for Generative Adversarial
Networks
- Authors: Vineeth S. Bhaskara, Tristan Aumentado-Armstrong, Allan Jepson, Alex
Levinshtein
- Abstract summary: Agenerative adversarial networks (GANs) predominantly use piecewise linear activation functions in discriminators (or critics)
We present Gradient Normalization (GraN), a novel input-dependent normalization method, which guarantees a piecewise K-Lipschitz constraint in the input space.
GraN does not constrain processing at the individual network layers, and, unlike gradient penalties, strictly enforces a piecewise Lipschitz constraint almost everywhere.
- Score: 2.3666095711348363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern generative adversarial networks (GANs) predominantly use piecewise
linear activation functions in discriminators (or critics), including ReLU and
LeakyReLU. Such models learn piecewise linear mappings, where each piece
handles a subset of the input space, and the gradients per subset are piecewise
constant. Under such a class of discriminator (or critic) functions, we present
Gradient Normalization (GraN), a novel input-dependent normalization method,
which guarantees a piecewise K-Lipschitz constraint in the input space. In
contrast to spectral normalization, GraN does not constrain processing at the
individual network layers, and, unlike gradient penalties, strictly enforces a
piecewise Lipschitz constraint almost everywhere. Empirically, we demonstrate
improved image generation performance across multiple datasets (incl.
CIFAR-10/100, STL-10, LSUN bedrooms, and CelebA), GAN loss functions, and
metrics. Further, we analyze altering the often untuned Lipschitz constant K in
several standard GANs, not only attaining significant performance gains, but
also finding connections between K and training dynamics, particularly in
low-gradient loss plateaus, with the common Adam optimizer.
Related papers
- CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization [36.20084231028338]
Generative Adversarial Networks (GANs) significantly advanced image generation but their performance heavily depends on abundant training data.
In scenarios with limited data, GANs often struggle with discriminator overfitting and unstable training.
We present CHAIN, which replaces the conventional centering step with zero-mean regularization and integrates a Lipschitz continuity constraint in the scaling step.
arXiv Detail & Related papers (2024-03-31T01:41:36Z) - The Implicit Bias of Batch Normalization in Linear Models and Two-layer
Linear Convolutional Neural Networks [117.93273337740442]
We show that gradient descent converges to a uniform margin classifier on the training data with an $exp(-Omega(log2 t))$ convergence rate.
We also show that batch normalization has an implicit bias towards a patch-wise uniform margin.
arXiv Detail & Related papers (2023-06-20T16:58:00Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - A Unified Algebraic Perspective on Lipschitz Neural Networks [88.14073994459586]
This paper introduces a novel perspective unifying various types of 1-Lipschitz neural networks.
We show that many existing techniques can be derived and generalized via finding analytical solutions of a common semidefinite programming (SDP) condition.
Our approach, called SDP-based Lipschitz Layers (SLL), allows us to design non-trivial yet efficient generalization of convex potential layers.
arXiv Detail & Related papers (2023-03-06T14:31:09Z) - Gradient descent provably escapes saddle points in the training of shallow ReLU networks [6.458742319938318]
We prove a variant of the relevant dynamical systems result, a center-stable manifold theorem, in which we relax some of the regularity requirements.
Building on a detailed examination of critical points of the square integral loss function for shallow ReLU and leaky ReLU networks, we show that gradient descents most saddle points.
arXiv Detail & Related papers (2022-08-03T14:08:52Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Sparsest Univariate Learning Models Under Lipschitz Constraint [31.28451181040038]
We propose continuous-domain formulations for one-dimensional regression problems.
We control the Lipschitz constant explicitly using a user-defined upper-bound.
We show that both problems admit global minimizers that are continuous and piecewise-linear.
arXiv Detail & Related papers (2021-12-27T07:03:43Z) - Learning Quantized Neural Nets by Coarse Gradient Method for Non-linear
Classification [3.158346511479111]
We propose a class of STEs with certain monotonicity, and consider their applications to the training of a two-linear-layer network with quantized activation functions.
We establish performance guarantees for the proposed STEs by showing that the corresponding coarse gradient methods converge to the global minimum.
arXiv Detail & Related papers (2020-11-23T07:50:09Z) - Implicit Bias in Deep Linear Classification: Initialization Scale vs
Training Accuracy [71.25689267025244]
We show how the transition is controlled by the relationship between the scale and how accurately we minimize the training loss.
Our results indicate that some limit behaviors of gradient descent only kick in at ridiculous training accuracies.
arXiv Detail & Related papers (2020-07-13T23:49:53Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.