Are Gaussian data all you need? Extents and limits of universality in
high-dimensional generalized linear estimation
- URL: http://arxiv.org/abs/2302.08923v1
- Date: Fri, 17 Feb 2023 14:56:40 GMT
- Title: Are Gaussian data all you need? Extents and limits of universality in
high-dimensional generalized linear estimation
- Authors: Luca Pesce, Florent Krzakala, Bruno Loureiro, Ludovic Stephan
- Abstract summary: We consider the problem of generalized linear estimation on Gaussian mixture data with labels given by a single-index model.
Motivated by the recent stream of results on the universality of the test and training errors in generalized linear estimation, we ask ourselves the question: "when is a single Gaussian enough to characterize the error?"
- Score: 24.933476324230377
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this manuscript we consider the problem of generalized linear estimation
on Gaussian mixture data with labels given by a single-index model. Our first
result is a sharp asymptotic expression for the test and training errors in the
high-dimensional regime. Motivated by the recent stream of results on the
Gaussian universality of the test and training errors in generalized linear
estimation, we ask ourselves the question: "when is a single Gaussian enough to
characterize the error?". Our formula allow us to give sharp answers to this
question, both in the positive and negative directions. More precisely, we show
that the sufficient conditions for Gaussian universality (or lack of thereof)
crucially depend on the alignment between the target weights and the means and
covariances of the mixture clusters, which we precisely quantify. In the
particular case of least-squares interpolation, we prove a strong universality
property of the training error, and show it follows a simple, closed-form
expression. Finally, we apply our results to real datasets, clarifying some
recent discussion in the literature about Gaussian universality of the errors
in this context.
Related papers
- The Breakdown of Gaussian Universality in Classification of High-dimensional Mixtures [6.863637695977277]
We provide a high-dimensional characterization of empirical risk minimization for classification under a general mixture data setting.
We specify conditions for Gaussian universality and discuss their implications for the choice of loss function.
arXiv Detail & Related papers (2024-10-08T01:45:37Z) - Implicit Manifold Gaussian Process Regression [49.0787777751317]
Gaussian process regression is widely used to provide well-calibrated uncertainty estimates.
It struggles with high-dimensional data because of the implicit low-dimensional manifold upon which the data actually lies.
In this paper we propose a technique capable of inferring implicit structure directly from data (labeled and unlabeled) in a fully differentiable way.
arXiv Detail & Related papers (2023-10-30T09:52:48Z) - The Inductive Bias of Flatness Regularization for Deep Matrix
Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks.
We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z) - Deterministic equivalent and error universality of deep random features
learning [4.8461049669050915]
This problem can be seen as a natural generalization of the widely studied random features model to deeper architectures.
First, we prove universality of the test error in a universality ridge setting where the learner and target networks share the same intermediate layers, and provide a sharp formula for it.
Second, we conjecture the universality of the test error in the more general setting of arbitrary convex losses and generic learner/target architectures.
arXiv Detail & Related papers (2023-02-01T12:37:10Z) - Gaussian Universality of Linear Classifiers with Random Labels in
High-Dimension [24.503842578208268]
We prove that data coming from a range of generative models in high-dimensions have the same minimum training loss as Gaussian data with corresponding data covariance.
In particular, our theorem covers data created by an arbitrary mixture of homogeneous Gaussian clouds, as well as multi-modal generative neural networks.
arXiv Detail & Related papers (2022-05-26T12:25:24Z) - A Robust and Flexible EM Algorithm for Mixtures of Elliptical
Distributions with Missing Data [71.9573352891936]
This paper tackles the problem of missing data imputation for noisy and non-Gaussian data.
A new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data.
Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data.
arXiv Detail & Related papers (2022-01-28T10:01:37Z) - Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression.
It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise.
This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z) - Spectral clustering under degree heterogeneity: a case for the random
walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree.
In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z) - Asymptotics of Ridge Regression in Convolutional Models [26.910291664252973]
We derive exact formulae for estimation error of ridge estimators that hold in a certain high-dimensional regime.
We show the double descent phenomenon in our experiments for convolutional models and show that our theoretical results match the experiments.
arXiv Detail & Related papers (2021-03-08T05:56:43Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.