Generalization Error of Generalized Linear Models in High Dimensions
- URL: http://arxiv.org/abs/2005.00180v1
- Date: Fri, 1 May 2020 02:17:47 GMT
- Title: Generalization Error of Generalized Linear Models in High Dimensions
- Authors: Melikasadat Emami, Mojtaba Sahraee-Ardakan, Parthe Pandit, Sundeep
Rangan, Alyson K. Fletcher
- Abstract summary: We provide a framework to characterize neural networks with arbitrary non-linearities.
We analyze the effect of regular logistic regression on learning.
Our model also captures examples between training and distributions special cases.
- Score: 25.635225717360466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: At the heart of machine learning lies the question of generalizability of
learned rules over previously unseen data. While over-parameterized models
based on neural networks are now ubiquitous in machine learning applications,
our understanding of their generalization capabilities is incomplete. This task
is made harder by the non-convexity of the underlying learning problems. We
provide a general framework to characterize the asymptotic generalization error
for single-layer neural networks (i.e., generalized linear models) with
arbitrary non-linearities, making it applicable to regression as well as
classification problems. This framework enables analyzing the effect of (i)
over-parameterization and non-linearity during modeling; and (ii) choices of
loss function, initialization, and regularizer during learning. Our model also
captures mismatch between training and test distributions. As examples, we
analyze a few special cases, namely linear regression and logistic regression.
We are also able to rigorously and analytically explain the \emph{double
descent} phenomenon in generalized linear models.
Related papers
- When are dynamical systems learned from time series data statistically accurate? [2.2577735334028923]
We propose an ergodic theoretic approach to generalization of complex dynamical models learned from time series data.
Our main contribution is to define and analyze generalization of neural representations of classes of ergodic systems, including chaotic systems.
arXiv Detail & Related papers (2024-11-09T23:44:17Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Towards Data-Algorithm Dependent Generalization: a Case Study on
Overparameterized Linear Regression [19.047997113063147]
We introduce a notion called data-algorithm compatibility, which considers the generalization behavior of the entire data-dependent training trajectory.
We perform a data-dependent trajectory analysis and derive a sufficient condition for compatibility in such a setting.
arXiv Detail & Related papers (2022-02-12T12:42:36Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Lower Bounds on the Generalization Error of Nonlinear Learning Models [2.1030878979833467]
We study in this paper lower bounds for the generalization error of models derived from multi-layer neural networks, in the regime where the size of the layers is commensurate with the number of samples in the training data.
We show that unbiased estimators have unacceptable performance for such nonlinear networks in this regime.
We derive explicit generalization lower bounds for general biased estimators, in the cases of linear regression and of two-layered networks.
arXiv Detail & Related papers (2021-03-26T20:37:54Z) - Hessian Eigenspectra of More Realistic Nonlinear Models [73.31363313577941]
We make a emphprecise characterization of the Hessian eigenspectra for a broad family of nonlinear models.
Our analysis takes a step forward to identify the origin of many striking features observed in more complex machine learning models.
arXiv Detail & Related papers (2021-03-02T06:59:52Z) - The Neural Tangent Kernel in High Dimensions: Triple Descent and a
Multi-Scale Theory of Generalization [34.235007566913396]
Modern deep learning models employ considerably more parameters than required to fit the training data. Whereas conventional statistical wisdom suggests such models should drastically overfit, in practice these models generalize remarkably well.
An emerging paradigm for describing this unexpected behavior is in terms of a emphdouble descent curve.
We provide a precise high-dimensional analysis of generalization with the Neural Tangent Kernel, which characterizes the behavior of wide neural networks with gradient descent.
arXiv Detail & Related papers (2020-08-15T20:55:40Z) - Spectral Bias and Task-Model Alignment Explain Generalization in Kernel
Regression and Infinitely Wide Neural Networks [17.188280334580195]
Generalization beyond a training dataset is a main goal of machine learning.
Recent observations in deep neural networks contradict conventional wisdom from classical statistics.
We show that more data may impair generalization when noisy or not expressible by the kernel.
arXiv Detail & Related papers (2020-06-23T17:53:11Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.