Related papers: Double Descent and Other Interpolation Phenomena in GANs

Double Descent and Other Interpolation Phenomena in GANs

URL: http://arxiv.org/abs/2106.04003v2
Date: Wed, 1 May 2024 01:48:52 GMT
Title: Double Descent and Other Interpolation Phenomena in GANs
Authors: Lorenzo Luzi, Yehuda Dar, Richard Baraniuk,
Abstract summary: We study the generalization error as a function of latent space dimension in generative adversarial networks (GANs) We develop a novel pseudo-supervised learning approach for GANs where the training utilizes pairs of fabricated (noise) inputs in conjunction with real output samples. While our analysis focuses mostly on linear models, we also apply important insights for improving generalization of nonlinear, multilayer GANs.
Score: 2.7007335372861974
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study overparameterization in generative adversarial networks (GANs) that can interpolate the training data. We show that overparameterization can improve generalization performance and accelerate the training process. We study the generalization error as a function of latent space dimension and identify two main behaviors, depending on the learning setting. First, we show that overparameterized generative models that learn distributions by minimizing a metric or $f$-divergence do not exhibit double descent in generalization errors; specifically, all the interpolating solutions achieve the same generalization error. Second, we develop a novel pseudo-supervised learning approach for GANs where the training utilizes pairs of fabricated (noise) inputs in conjunction with real output samples. Our pseudo-supervised setting exhibits double descent (and in some cases, triple descent) of generalization errors. We combine pseudo-supervision with overparameterization (i.e., overly large latent space dimension) to accelerate training while matching or even surpassing generalization performance without pseudo-supervision. While our analysis focuses mostly on linear models, we also apply important insights for improving generalization of nonlinear, multilayer GANs.

Related papers

Exact, Tractable Gauss-Newton Optimization in Deep Reversible Architectures Reveal Poor Generalization [52.16435732772263]
Second-order optimization has been shown to accelerate the training of deep neural networks in many applications. However, generalization properties of second-order methods are still being debated. We show for the first time that exact Gauss-Newton (GN) updates take on a tractable form in a class of deep architectures.
arXiv Detail & Related papers (2024-11-12T17:58:40Z)
HG-Adapter: Improving Pre-Trained Heterogeneous Graph Neural Networks with Dual Adapters [53.97380482341493]
"pre-train, prompt-tuning" has demonstrated impressive performance for tuning pre-trained heterogeneous graph neural networks (HGNNs) We propose a unified framework that combines two new adapters with potential labeled data extension to improve the generalization of pre-trained HGNN models.
arXiv Detail & Related papers (2024-11-02T06:43:54Z)
Understanding the Double Descent Phenomenon in Deep Learning [49.1574468325115]
This tutorial sets the classical statistical learning framework and introduces the double descent phenomenon. By looking at a number of examples, section 2 introduces inductive biases that appear to have a key role in double descent by selecting. section 3 explores the double descent with two linear models, and gives other points of view from recent related works.
arXiv Detail & Related papers (2024-03-15T16:51:24Z)
Learning Trajectories are Generalization Indicators [44.53518627207067]
This paper explores the connection between learning trajectories of Deep Neural Networks (DNNs) and their generalization capabilities. We present a novel perspective for analyzing generalization error by investigating the contribution of each update step to the change in generalization error. Our approach can also track changes in generalization error when adjustments are made to learning rates and label noise levels.
arXiv Detail & Related papers (2023-04-25T05:08:57Z)
Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features. We find new and interesting properties that do not exist in single-task linear regression. Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z)
Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks. We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space. We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z)
Understanding Overparameterization in Generative Adversarial Networks [56.57403335510056]
Generative Adversarial Networks (GANs) are used to train non- concave mini-max optimization problems. A theory has shown the importance of the gradient descent (GD) to globally optimal solutions. We show that in an overized GAN with a $1$-layer neural network generator and a linear discriminator, the GDA converges to a global saddle point of the underlying non- concave min-max problem.
arXiv Detail & Related papers (2021-04-12T16:23:37Z)
Linear Regression with Distributed Learning: A Generalization Error Perspective [0.0]
We investigate the performance of distributed learning for large-scale linear regression. We focus on the generalization error, i.e., the performance on unseen data. Our results show that the generalization error of the distributed solution can be substantially higher than that of the centralized solution.
arXiv Detail & Related papers (2021-01-22T08:43:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.