Universality of High-Dimensional Logistic Regression and a Novel CGMT under Dependence with Applications to Data Augmentation
- URL: http://arxiv.org/abs/2502.15752v2
- Date: Wed, 02 Apr 2025 11:29:34 GMT
- Title: Universality of High-Dimensional Logistic Regression and a Novel CGMT under Dependence with Applications to Data Augmentation
- Authors: Matthew Esmaili Mallory, Kevin Han Huang, Morgane Austern,
- Abstract summary: We prove that Gaussian universality still holds for high-dimensional logistic regression under block dependence.<n>We establish the impact of data augmentation, a widespread practice in deep learning, on the risk.
- Score: 6.092792437962955
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Over the last decade, a wave of research has characterized the exact asymptotic risk of many high-dimensional models in the proportional regime. Two foundational results have driven this progress: Gaussian universality, which shows that the asymptotic risk of estimators trained on non-Gaussian and Gaussian data is equivalent, and the convex Gaussian min-max theorem (CGMT), which characterizes the risk under Gaussian settings. However, these results rely on the assumption that the data consists of independent random vectors--an assumption that significantly limits its applicability to many practical setups. In this paper, we address this limitation by generalizing both results to the dependent setting. More precisely, we prove that Gaussian universality still holds for high-dimensional logistic regression under block dependence, $m$-dependence and special cases of mixing, and establish a novel CGMT framework that accommodates for correlation across both the covariates and observations. Using these results, we establish the impact of data augmentation, a widespread practice in deep learning, on the asymptotic risk.
Related papers
- Asymptotics of Non-Convex Generalized Linear Models in High-Dimensions: A proof of the replica formula [17.036996839737828]
We show how an algorithm can be used to prove the optimality of a non-dimensional Gaussian regularization model.<n>We also show how we can use the Tukey loss to prove the optimality of a negative regularization model.
arXiv Detail & Related papers (2025-02-27T11:29:43Z) - The Breakdown of Gaussian Universality in Classification of High-dimensional Mixtures [6.863637695977277]
We provide a high-dimensional characterization of empirical risk minimization for classification under a general mixture data setting.
We specify conditions for Gaussian universality and discuss their implications for the choice of loss function.
arXiv Detail & Related papers (2024-10-08T01:45:37Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.<n>We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk.<n>We further extend our analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Ridge interpolators in correlated factor regression models -- exact risk analysis [0.0]
We consider correlated emphfactor regression models (FRM) and analyze the performance of classical ridge interpolators.
We provide emphexcess prediction risk characterizations that clearly show the dependence on all key model parameters.
arXiv Detail & Related papers (2024-06-13T14:46:08Z) - Conformal inference for regression on Riemannian Manifolds [49.7719149179179]
We investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space.
We prove the almost sure convergence of the empirical version of these regions on the manifold to their population counterparts.
arXiv Detail & Related papers (2023-10-12T10:56:25Z) - Gaussian Universality of Linear Classifiers with Random Labels in
High-Dimension [24.503842578208268]
We prove that data coming from a range of generative models in high-dimensions have the same minimum training loss as Gaussian data with corresponding data covariance.
In particular, our theorem covers data created by an arbitrary mixture of homogeneous Gaussian clouds, as well as multi-modal generative neural networks.
arXiv Detail & Related papers (2022-05-26T12:25:24Z) - Gaussian and Non-Gaussian Universality of Data Augmentation [6.51882364384472]
We quantify how data augmentation affects the variance and limiting distribution of estimates through simple surrogates.
Data augmentation may increase rather than decrease the uncertainty of estimates, such as the empirical prediction risk.
As our main theoretical tool, we develop an adaptation of Lindeberg's technique for block dependence.
arXiv Detail & Related papers (2022-02-18T11:32:41Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer
Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data.
We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z) - Learning Gaussian Mixtures with Generalised Linear Models: Precise
Asymptotics in High-dimensions [79.35722941720734]
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks.
We prove exacts characterising the estimator in high-dimensions via empirical risk minimisation.
We discuss how our theory can be applied beyond the scope of synthetic data.
arXiv Detail & Related papers (2021-06-07T16:53:56Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.