Explaining generalization in deep learning: progress and fundamental
limits
- URL: http://arxiv.org/abs/2110.08922v1
- Date: Sun, 17 Oct 2021 21:17:30 GMT
- Title: Explaining generalization in deep learning: progress and fundamental
limits
- Authors: Vaishnavh Nagarajan
- Abstract summary: In the first part of the thesis, we will empirically study how training deep networks via gradient descent implicitly controls the networks' capacity.
We will then derive em data-dependent em uniform-convergence-based generalization bounds with improved dependencies on the parameter count.
In the last part of the thesis, we will introduce an em empirical technique to estimate generalization using unlabeled data.
- Score: 8.299945169799795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This dissertation studies a fundamental open challenge in deep learning
theory: why do deep networks generalize well even while being
overparameterized, unregularized and fitting the training data to zero error?
In the first part of the thesis, we will empirically study how training deep
networks via stochastic gradient descent implicitly controls the networks'
capacity. Subsequently, to show how this leads to better generalization, we
will derive {\em data-dependent} {\em uniform-convergence-based} generalization
bounds with improved dependencies on the parameter count.
Uniform convergence has in fact been the most widely used tool in deep
learning literature, thanks to its simplicity and generality. Given its
popularity, in this thesis, we will also take a step back to identify the
fundamental limits of uniform convergence as a tool to explain generalization.
In particular, we will show that in some example overparameterized settings,
{\em any} uniform convergence bound will provide only a vacuous generalization
bound.
With this realization in mind, in the last part of the thesis, we will change
course and introduce an {\em empirical} technique to estimate generalization
using unlabeled data. Our technique does not rely on any notion of
uniform-convergece-based complexity and is remarkably precise. We will
theoretically show why our technique enjoys such precision.
We will conclude by discussing how future work could explore novel ways to
incorporate distributional assumptions in generalization bounds (such as in the
form of unlabeled data) and explore other tools to derive bounds, perhaps by
modifying uniform convergence or by developing completely new tools altogether.
Related papers
- Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity [84.12126298229866]
We show that zero-shot generalization during instruction tuning happens very early.
We also show that encountering highly similar and fine-grained training data earlier during instruction tuning, without the constraints of defined "tasks", enables better generalization.
For the first time, we show that zero-shot generalization during instruction tuning is a form of similarity-based generalization between training and test data at the instance level.
arXiv Detail & Related papers (2024-06-17T16:40:21Z) - Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation [59.138470433237615]
We introduce statistical metrics that quantify both the linguistic and visual skew of a dataset for relational learning.
We show that systematically controlled metrics are strongly predictive of generalization performance.
This work informs an important direction towards quality-enhancing the data diversity or balance to scaling up the absolute size.
arXiv Detail & Related papers (2024-03-25T03:18:39Z) - Fantastic Generalization Measures are Nowhere to be Found [14.599761709255917]
We study the notion of a generalization bound being uniformly tight, meaning that the difference between the bound and the population loss is small.
Numerous generalization bounds have been proposed in the literature as potential explanations for the ability of neural networks to generalize.
arXiv Detail & Related papers (2023-09-24T14:53:51Z) - Generalization Analysis for Contrastive Representation Learning [80.89690821916653]
Existing generalization error bounds depend linearly on the number $k$ of negative examples.
We establish novel generalization bounds for contrastive learning which do not depend on $k$, up to logarithmic terms.
arXiv Detail & Related papers (2023-02-24T01:03:56Z) - Generalization Bounds with Data-dependent Fractal Dimensions [5.833272638548154]
We prove fractal geometry-based generalization bounds without requiring any Lipschitz assumption.
Despite introducing a significant amount of technical complications, this new notion lets us control the generalization error.
arXiv Detail & Related papers (2023-02-06T13:24:48Z) - PAC-Bayes Compression Bounds So Tight That They Can Explain
Generalization [48.26492774959634]
We develop a compression approach based on quantizing neural network parameters in a linear subspace.
We find large models can be compressed to a much greater extent than previously known, encapsulating Occam's razor.
arXiv Detail & Related papers (2022-11-24T13:50:16Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Learning Non-Vacuous Generalization Bounds from Optimization [8.294831479902658]
We present a simple yet non-vacuous generalization bound from the optimization perspective.
We achieve this goal by leveraging that the hypothesis set accessed by gradient algorithms is essentially fractal-like.
Numerical studies demonstrate that our approach is able to yield plausible generalization guarantees for modern neural networks.
arXiv Detail & Related papers (2022-06-09T08:59:46Z) - An Online Learning Approach to Interpolation and Extrapolation in Domain
Generalization [53.592597682854944]
We recast generalization over sub-groups as an online game between a player minimizing risk and an adversary presenting new test.
We show that ERM is provably minimax-optimal for both tasks.
arXiv Detail & Related papers (2021-02-25T19:06:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.