Related papers: Lower Bounds on the Generalization Error of Nonlinear Learning Models

Lower Bounds on the Generalization Error of Nonlinear Learning Models

URL: http://arxiv.org/abs/2103.14723v1
Date: Fri, 26 Mar 2021 20:37:54 GMT
Title: Lower Bounds on the Generalization Error of Nonlinear Learning Models
Authors: Inbar Seroussi, Ofer Zeitouni
Abstract summary: We study in this paper lower bounds for the generalization error of models derived from multi-layer neural networks, in the regime where the size of the layers is commensurate with the number of samples in the training data. We show that unbiased estimators have unacceptable performance for such nonlinear networks in this regime. We derive explicit generalization lower bounds for general biased estimators, in the cases of linear regression and of two-layered networks.
Score: 2.1030878979833467
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study in this paper lower bounds for the generalization error of models derived from multi-layer neural networks, in the regime where the size of the layers is commensurate with the number of samples in the training data. We show that unbiased estimators have unacceptable performance for such nonlinear networks in this regime. We derive explicit generalization lower bounds for general biased estimators, in the cases of linear regression and of two-layered networks. In the linear case the bound is asymptotically tight. In the nonlinear case, we provide a comparison of our bounds with an empirical study of the stochastic gradient descent algorithm. The analysis uses elements from the theory of large random matrices.

Related papers

Random Matrix Theory for Deep Learning: Beyond Eigenvalues of Linear Models [51.85815025140659]
Modern Machine Learning (ML) and Deep Neural Networks (DNNs) often operate on high-dimensional data.<n>In particular, the proportional regime where the data dimension, sample size, and number of model parameters are all large gives rise to novel and sometimes counterintuitive behaviors.<n>This paper extends traditional Random Matrix Theory (RMT) beyond eigenvalue-based analysis of linear models to address the challenges posed by nonlinear ML models.
arXiv Detail & Related papers (2025-06-16T06:54:08Z)
Importance Sampling for Nonlinear Models [5.421981644827842]
We introduce the concept of the adjoint operator of a nonlinear map.<n>We demonstrate that sampling based on these notions of norm and leverage scores provides approximation guarantees for the underlying nonlinear mapping.
arXiv Detail & Related papers (2025-05-18T10:34:39Z)
Gradient descent inference in empirical risk minimization [1.1510009152620668]
Gradient descent is one of the most widely used iterative algorithms in modern statistical learning. This paper provides a precise, non-asymotical characterization of gradient descent in a broad class of empirical risk minimization problems.
arXiv Detail & Related papers (2024-12-12T17:47:08Z)
Generalization for Least Squares Regression With Simple Spiked Covariances [3.9134031118910264]
The generalization properties of even two-layer neural networks trained by gradient descent remain poorly understood. Recent work has made progress by describing the spectrum of the feature matrix at the hidden layer. Yet, the generalization error for linear models with spiked covariances has not been previously determined.
arXiv Detail & Related papers (2024-10-17T19:46:51Z)
Classification of Data Generated by Gaussian Mixture Models Using Deep ReLU Networks [28.437011792990347]
This paper studies the binary classification of data from $math RMs. generated under Gaussian Mixture networks. We obtain $d2013x neural analysis rates for the first time convergence rates. Results provide a theoretical verification of deep neural networks in practical classification problems.
arXiv Detail & Related papers (2023-08-15T20:40:42Z)
Learning Linear Causal Representations from Interventions under General Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets. This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z)
Fast Convergence in Learning Two-Layer Neural Networks with Separable Data [37.908159361149835]
We study normalized gradient descent on two-layer neural nets. We prove for exponentially-tailed losses that using normalized GD leads to linear rate of convergence of the training loss to the global optimum.
arXiv Detail & Related papers (2023-05-22T20:30:10Z)
Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks. We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space. We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z)
Adaptive deep learning for nonlinear time series models [0.0]
We develop a theory for adaptive nonparametric estimation of the mean function of a non-stationary and nonlinear time series model using deep neural networks (DNNs) We derive minimax lower bounds for estimating mean functions belonging to a wide class of nonlinear autoregressive (AR) models.
arXiv Detail & Related papers (2022-07-06T09:58:58Z)
The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data. We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z)
Hessian Eigenspectra of More Realistic Nonlinear Models [73.31363313577941]
We make a emphprecise characterization of the Hessian eigenspectra for a broad family of nonlinear models. Our analysis takes a step forward to identify the origin of many striking features observed in more complex machine learning models.
arXiv Detail & Related papers (2021-03-02T06:59:52Z)
Dimension Free Generalization Bounds for Non Linear Metric Learning [61.193693608166114]
We provide uniform generalization bounds for two regimes -- the sparse regime, and a non-sparse regime. We show that by relying on a different, new property of the solutions, it is still possible to provide dimension free generalization guarantees.
arXiv Detail & Related papers (2021-02-07T14:47:00Z)
Learning Fast Approximations of Sparse Nonlinear Regression [50.00693981886832]
In this work, we bridge the gap by introducing the Threshold Learned Iterative Shrinkage Algorithming (NLISTA) Experiments on synthetic data corroborate our theoretical results and show our method outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-26T11:31:08Z)
Generalization Error of Generalized Linear Models in High Dimensions [25.635225717360466]
We provide a framework to characterize neural networks with arbitrary non-linearities. We analyze the effect of regular logistic regression on learning. Our model also captures examples between training and distributions special cases.
arXiv Detail & Related papers (2020-05-01T02:17:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.