Related papers: Does the Data Induce Capacity Control in Deep Learning?

Does the Data Induce Capacity Control in Deep Learning?

URL: http://arxiv.org/abs/2110.14163v1
Date: Wed, 27 Oct 2021 04:40:27 GMT
Title: Does the Data Induce Capacity Control in Deep Learning?
Authors: Yang Rubing, Mao Jialin, Chaudhari Pratik
Abstract summary: This paper studies how the dataset may be the cause of the anomalous generalization performance of deep networks. We show that the data correlation matrix of typical classification datasets has an eigenspectrum where, after a sharp initial drop, a large number of small eigenvalues are distributed uniformly over an exponentially large range.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This paper studies how the dataset may be the cause of the anomalous generalization performance of deep networks. We show that the data correlation matrix of typical classification datasets has an eigenspectrum where, after a sharp initial drop, a large number of small eigenvalues are distributed uniformly over an exponentially large range. This structure is mirrored in a network trained on this data: we show that the Hessian and the Fisher Information Matrix (FIM) have eigenvalues that are spread uniformly over exponentially large ranges. We call such eigenspectra "sloppy" because sets of weights corresponding to small eigenvalues can be changed by large magnitudes without affecting the loss. Networks trained on atypical, non-sloppy synthetic data do not share these traits. We show how this structure in the data can give to non-vacuous PAC-Bayes generalization bounds analytically; we also construct data-distribution dependent priors that lead to accurate bounds using numerical optimization.

Related papers

An Effective Gram Matrix Characterizes Generalization in Deep Networks [22.314071077213935]
We derive a differential equation that governs the evolution of the generalization gap when a deep network is trained by gradient descent. We analyze this differential equation to compute an effective Gram matrix'' that characterizes the generalization gap after training.
arXiv Detail & Related papers (2025-04-23T06:24:42Z)
Demystifying Spectral Bias on Real-World Data [2.3020018305241337]
Kernel ridge regression (KRR) and Gaussian processes (GPs) are fundamental tools in statistics and machine learning. We consider cross-dataset learnability and show that one may use eigenvalues and eigenfunctions associated with highly idealized data measures to reveal spectral bias on complex datasets.
arXiv Detail & Related papers (2024-06-04T18:00:00Z)
Improving Expressive Power of Spectral Graph Neural Networks with Eigenvalue Correction [55.57072563835959]
We propose an eigenvalue correction strategy that can free filters from the constraints of repeated eigenvalue inputs. Concretely, the proposed eigenvalue correction strategy enhances the uniform distribution of eigenvalues, and improves the fitting capacity and expressive power of filters.
arXiv Detail & Related papers (2024-01-28T08:12:00Z)
Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction. We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z)
Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory. We show that linear networks make provably optimal predictions at infinite depth. We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z)
Gaussian Universality of Linear Classifiers with Random Labels in High-Dimension [24.503842578208268]
We prove that data coming from a range of generative models in high-dimensions have the same minimum training loss as Gaussian data with corresponding data covariance. In particular, our theorem covers data created by an arbitrary mixture of homogeneous Gaussian clouds, as well as multi-modal generative neural networks.
arXiv Detail & Related papers (2022-05-26T12:25:24Z)
Harmless interpolation in regression and classification with structured features [21.064512161584872]
Overparametrized neural networks tend to perfectly fit noisy training data yet generalize well on test data. We present a general and flexible framework for upper bounding regression and classification risk in a reproducing kernel Hilbert space.
arXiv Detail & Related papers (2021-11-09T15:12:26Z)
Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning. We propose a novel method of using data augmentations when training autoencoders. We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z)
Benign overfitting in ridge regression [0.0]
We provide non-asymptotic generalization bounds for overparametrized ridge regression. We identify when small or negative regularization is sufficient for obtaining small generalization error.
arXiv Detail & Related papers (2020-09-29T20:00:31Z)
Self-training Avoids Using Spurious Features Under Domain Shift [54.794607791641745]
In unsupervised domain adaptation, conditional entropy minimization and pseudo-labeling work even when the domain shifts are much larger than those analyzed by existing theory. We identify and analyze one particular setting where the domain shift can be large, but certain spurious features correlate with label in the source domain but are independent label in the target.
arXiv Detail & Related papers (2020-06-17T17:51:42Z)
Eigendecomposition-Free Training of Deep Networks for Linear Least-Square Problems [107.3868459697569]
We introduce an eigendecomposition-free approach to training a deep network. We show that our approach is much more robust than explicit differentiation of the eigendecomposition. Our method has better convergence properties and yields state-of-the-art results.
arXiv Detail & Related papers (2020-04-15T04:29:34Z)
Neural Networks and Polynomial Regression. Demystifying the Overparametrization Phenomena [17.205106391379026]
In the context of neural network models, overparametrization refers to the phenomena whereby these models appear to generalize well on the unseen data. A conventional explanation of this phenomena is based on self-regularization properties of algorithms used to train the data. We show that any student network interpolating the data generated by a teacher network generalizes well, provided that the sample size is at least an explicit quantity controlled by data dimension.
arXiv Detail & Related papers (2020-03-23T20:09:31Z)
Revealing the Structure of Deep Neural Networks via Convex Duality [70.15611146583068]
We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of hidden layers. We show that a set of optimal hidden layer weights for a norm regularized training problem can be explicitly found as the extreme points of a convex set. We apply the same characterization to deep ReLU networks with whitened data and prove the same weight alignment holds.
arXiv Detail & Related papers (2020-02-22T21:13:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.