On Predicting Generalization using GANs
- URL: http://arxiv.org/abs/2111.14212v1
- Date: Sun, 28 Nov 2021 19:03:21 GMT
- Title: On Predicting Generalization using GANs
- Authors: Yi Zhang, Arushi Gupta, Nikunj Saunshi, Sanjeev Arora
- Abstract summary: Research on generalization bounds for deep networks seeks to give ways to predict test error using just the training dataset and the network parameters.
This paper investigates the idea of can test error be predicted using'synthetic data' produced using a Generative Adversarial Network (GAN)
GANs have well-known limitations (e.g. mode collapse) and are known to not learn the data distribution accurately.
- Score: 34.13321525940004
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Research on generalization bounds for deep networks seeks to give ways to
predict test error using just the training dataset and the network parameters.
While generalization bounds can give many insights about architecture design,
training algorithms etc., what they do not currently do is yield good
predictions for actual test error. A recently introduced Predicting
Generalization in Deep Learning competition aims to encourage discovery of
methods to better predict test error. The current paper investigates a simple
idea: can test error be predicted using 'synthetic data' produced using a
Generative Adversarial Network (GAN) that was trained on the same training
dataset? Upon investigating several GAN models and architectures, we find that
this turns out to be the case. In fact, using GANs pre-trained on standard
datasets, the test error can be predicted without requiring any additional
hyper-parameter tuning. This result is surprising because GANs have well-known
limitations (e.g. mode collapse) and are known to not learn the data
distribution accurately. Yet the generated samples are good enough to
substitute for test data. Several additional experiments are presented to
explore reasons why GANs do well at this task. In addition to a new approach
for predicting generalization, the counter-intuitive phenomena presented in our
work may also call for a better understanding of GANs' strengths and
limitations.
Related papers
- Generalized Regression with Conditional GANs [2.4171019220503402]
We propose to learn a prediction function whose outputs, when paired with the corresponding inputs, are indistinguishable from feature-label pairs in the training dataset.
We show that this approach to regression makes fewer assumptions on the distribution of the data we are fitting to and, therefore, has better representation capabilities.
arXiv Detail & Related papers (2024-04-21T01:27:47Z) - Inferring Data Preconditions from Deep Learning Models for Trustworthy
Prediction in Deployment [25.527665632625627]
It is important to reason about the trustworthiness of the model's predictions with unseen data during deployment.
Existing methods for specifying and verifying traditional software are insufficient for this task.
We propose a novel technique that uses rules derived from neural network computations to infer data preconditions.
arXiv Detail & Related papers (2024-01-26T03:47:18Z) - Bounding generalization error with input compression: An empirical study
with infinite-width networks [16.17600110257266]
Estimating the Generalization Error (GE) of Deep Neural Networks (DNNs) is an important task that often relies on availability of held-out data.
In search of a quantity relevant to GE, we investigate the Mutual Information (MI) between the input and final layer representations.
An existing input compression-based GE bound is used to link MI and GE.
arXiv Detail & Related papers (2022-07-19T17:05:02Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Discovering Invariant Rationales for Graph Neural Networks [104.61908788639052]
Intrinsic interpretability of graph neural networks (GNNs) is to find a small subset of the input graph's features.
We propose a new strategy of discovering invariant rationale (DIR) to construct intrinsically interpretable GNNs.
arXiv Detail & Related papers (2022-01-30T16:43:40Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators.
They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions.
We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.