Bounding generalization error with input compression: An empirical study
with infinite-width networks
- URL: http://arxiv.org/abs/2207.09408v1
- Date: Tue, 19 Jul 2022 17:05:02 GMT
- Title: Bounding generalization error with input compression: An empirical study
with infinite-width networks
- Authors: Angus Galloway, Anna Golubeva, Mahmoud Salem, Mihai Nica, Yani
Ioannou, Graham W. Taylor
- Abstract summary: Estimating the Generalization Error (GE) of Deep Neural Networks (DNNs) is an important task that often relies on availability of held-out data.
In search of a quantity relevant to GE, we investigate the Mutual Information (MI) between the input and final layer representations.
An existing input compression-based GE bound is used to link MI and GE.
- Score: 16.17600110257266
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimating the Generalization Error (GE) of Deep Neural Networks (DNNs) is an
important task that often relies on availability of held-out data. The ability
to better predict GE based on a single training set may yield overarching DNN
design principles to reduce a reliance on trial-and-error, along with other
performance assessment advantages. In search of a quantity relevant to GE, we
investigate the Mutual Information (MI) between the input and final layer
representations, using the infinite-width DNN limit to bound MI. An existing
input compression-based GE bound is used to link MI and GE. To the best of our
knowledge, this represents the first empirical study of this bound. In our
attempt to empirically falsify the theoretical bound, we find that it is often
tight for best-performing models. Furthermore, it detects randomization of
training labels in many cases, reflects test-time perturbation robustness, and
works well given only few training samples. These results are promising given
that input compression is broadly applicable where MI can be estimated with
confidence.
Related papers
- YOSO: You-Only-Sample-Once via Compressed Sensing for Graph Neural Network Training [9.02251811867533]
YOSO (You-Only-Sample-Once) is an algorithm designed to achieve efficient training while preserving prediction accuracy.
YOSO not only avoids costly computations in traditional compressed sensing (CS) methods, such as orthonormal basis calculations, but also ensures high-probability accuracy retention.
arXiv Detail & Related papers (2024-11-08T16:47:51Z) - Generalization of Graph Neural Networks is Robust to Model Mismatch [84.01980526069075]
Graph neural networks (GNNs) have demonstrated their effectiveness in various tasks supported by their generalization capabilities.
In this paper, we examine GNNs that operate on geometric graphs generated from manifold models.
Our analysis reveals the robustness of the GNN generalization in the presence of such model mismatch.
arXiv Detail & Related papers (2024-08-25T16:00:44Z) - Slicing Mutual Information Generalization Bounds for Neural Networks [14.48773730230054]
We introduce new, tighter information-theoretic generalization bounds tailored for deep learning algorithms.
Our bounds offer significant computational and statistical advantages over standard MI bounds.
We extend our analysis to algorithms whose parameters do not need to exactly lie on random subspaces.
arXiv Detail & Related papers (2024-06-06T13:15:37Z) - Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs.
We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD.
We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z) - Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural
Networks [89.28881869440433]
This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs)
It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
arXiv Detail & Related papers (2023-02-06T16:54:20Z) - A Simple Approach to Improve Single-Model Deep Uncertainty via
Distance-Awareness [33.09831377640498]
We study approaches to improve uncertainty property of a single network, based on a single, deterministic representation.
We propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs.
On a suite of vision and language understanding benchmarks, SNGP outperforms other single-model approaches in prediction, calibration and out-of-domain detection.
arXiv Detail & Related papers (2022-05-01T05:46:13Z) - On Predicting Generalization using GANs [34.13321525940004]
Research on generalization bounds for deep networks seeks to give ways to predict test error using just the training dataset and the network parameters.
This paper investigates the idea of can test error be predicted using'synthetic data' produced using a Generative Adversarial Network (GAN)
GANs have well-known limitations (e.g. mode collapse) and are known to not learn the data distribution accurately.
arXiv Detail & Related papers (2021-11-28T19:03:21Z) - A Biased Graph Neural Network Sampler with Near-Optimal Regret [57.70126763759996]
Graph neural networks (GNN) have emerged as a vehicle for applying deep network architectures to graph and relational data.
In this paper, we build upon existing work and treat GNN neighbor sampling as a multi-armed bandit problem.
We introduce a newly-designed reward function that introduces some degree of bias designed to reduce variance and avoid unstable, possibly-unbounded payouts.
arXiv Detail & Related papers (2021-03-01T15:55:58Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z) - A Survey on Assessing the Generalization Envelope of Deep Neural
Networks: Predictive Uncertainty, Out-of-distribution and Adversarial Samples [77.99182201815763]
Deep Neural Networks (DNNs) achieve state-of-the-art performance on numerous applications.
It is difficult to tell beforehand if a DNN receiving an input will deliver the correct output since their decision criteria are usually nontransparent.
This survey connects the three fields within the larger framework of investigating the generalization performance of machine learning methods and in particular DNNs.
arXiv Detail & Related papers (2020-08-21T09:12:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.