Related papers: Dimension Independent Generalization Error by Stochastic Gradient Descent

Dimension Independent Generalization Error by Stochastic Gradient Descent

URL: http://arxiv.org/abs/2003.11196v2
Date: Mon, 4 Jan 2021 06:13:47 GMT
Title: Dimension Independent Generalization Error by Stochastic Gradient Descent
Authors: Xi Chen and Qiang Liu and Xin T. Tong
Abstract summary: We present a theory on the generalization error of descent (SGD) solutions for both and locally convex loss functions. We show that the generalization error does not depend on the $p$ dimension or depends on the low effective $p$logarithmic factor.
Score: 12.474236773219067
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One classical canon of statistics is that large models are prone to overfitting, and model selection procedures are necessary for high dimensional data. However, many overparameterized models, such as neural networks, perform very well in practice, although they are often trained with simple online methods and regularization. The empirical success of overparameterized models, which is often known as benign overfitting, motivates us to have a new look at the statistical generalization theory for online optimization. In particular, we present a general theory on the generalization error of stochastic gradient descent (SGD) solutions for both convex and locally convex loss functions. We further discuss data and model conditions that lead to a ``low effective dimension". Under these conditions, we show that the generalization error either does not depend on the ambient dimension $p$ or depends on $p$ via a poly-logarithmic factor. We also demonstrate that in several widely used statistical models, the ``low effective dimension'' arises naturally in overparameterized settings. The studied statistical applications include both convex models such as linear regression and logistic regression and non-convex models such as $M$-estimator and two-layer neural networks.

Related papers

Random Matrix Theory for Deep Learning: Beyond Eigenvalues of Linear Models [51.85815025140659]
Modern Machine Learning (ML) and Deep Neural Networks (DNNs) often operate on high-dimensional data.<n>In particular, the proportional regime where the data dimension, sample size, and number of model parameters are all large gives rise to novel and sometimes counterintuitive behaviors.<n>This paper extends traditional Random Matrix Theory (RMT) beyond eigenvalue-based analysis of linear models to address the challenges posed by nonlinear ML models.
arXiv Detail & Related papers (2025-06-16T06:54:08Z)
Are Statistical Methods Obsolete in the Era of Deep Learning? [0.8329456268842228]
In the era of AI, neural networks have become increasingly popular for modeling, inference, and prediction.<n>With the proliferation of such deep learning models, a question arises: are leaner statistical methods still relevant?<n>We show that statistical methods are far from obsolete, especially when working with sparse and noisy observations.
arXiv Detail & Related papers (2025-05-27T20:11:21Z)
Scaling Law for Stochastic Gradient Descent in Quadratically Parameterized Linear Regression [5.801904710149222]
In machine learning, the scaling law describes how the model performance improves with the model and data size scaling up. This paper studies the scaling law over a linear regression with the model being quadratically parameterized. As a result, in the canonical linear regression, we provide explicit separations for curves between generalization with and without feature learning, and the information-theoretical lower bound that is to parametrization method and the algorithm.
arXiv Detail & Related papers (2025-02-13T09:29:04Z)
Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z)
The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes. In this paper we examine the use of convex neural recovery models. We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program. We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z)
A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime. We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z)
Analysis of Interpolating Regression Models and the Double Descent Phenomenon [3.883460584034765]
It is commonly assumed that models which interpolate noisy training data are poor to generalize. The best models obtained are overparametrized and the testing error exhibits the double descent behavior as the model order increases. We derive a result based on the behavior of the smallest singular value of the regression matrix that explains the peak location and the double descent shape of the testing error as a function of model order.
arXiv Detail & Related papers (2023-04-17T09:44:33Z)
Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features. We find new and interesting properties that do not exist in single-task linear regression. Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z)
RMFGP: Rotated Multi-fidelity Gaussian process with Dimension Reduction for High-dimensional Uncertainty Quantification [12.826754199680474]
Multi-fidelity modelling enables accurate inference even when only a small set of accurate data is available. By combining the realizations of the high-fidelity model with one or more low-fidelity models, the multi-fidelity method can make accurate predictions of quantities of interest. This paper proposes a new dimension reduction framework based on rotated multi-fidelity Gaussian process regression and a Bayesian active learning scheme.
arXiv Detail & Related papers (2022-04-11T01:20:35Z)
Nonparametric Functional Analysis of Generalized Linear Models Under Nonlinear Constraints [0.0]
This article introduces a novel nonparametric methodology for Generalized Linear Models. It combines the strengths of the binary regression and latent variable formulations for categorical data. It extends recently published parametric versions of the methodology and generalizes it.
arXiv Detail & Related papers (2021-10-11T04:49:59Z)
Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models. We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data. We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z)
Non-parametric Models for Non-negative Functions [48.7576911714538]
We provide the first model for non-negative functions from the same good linear models. We prove that it admits a representer theorem and provide an efficient dual formulation for convex problems.
arXiv Detail & Related papers (2020-07-08T07:17:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.