Asymptotic errors for convex penalized linear regression beyond Gaussian
matrices
- URL: http://arxiv.org/abs/2002.04372v1
- Date: Tue, 11 Feb 2020 13:43:32 GMT
- Title: Asymptotic errors for convex penalized linear regression beyond Gaussian
matrices
- Authors: C\'edric Gerbelot, Alia Abbara and Florent Krzakala
- Abstract summary: We consider the problem of learning a coefficient vector $x_0$ in $RN$ from noisy linear observations.
We provide a rigorous derivation of an explicit formula for the mean squared error.
We show that our predictions agree remarkably well with numerics even for very moderate sizes.
- Score: 23.15629681360836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of learning a coefficient vector $x_{0}$ in $R^{N}$
from noisy linear observations $y=Fx_{0}+w$ in $R^{M}$ in the high dimensional
limit $M,N$ to infinity with $\alpha=M/N$ fixed. We provide a rigorous
derivation of an explicit formula -- first conjectured using heuristic methods
from statistical physics -- for the asymptotic mean squared error obtained by
penalized convex regression estimators such as the LASSO or the elastic net,
for a class of very generic random matrices corresponding to rotationally
invariant data matrices with arbitrary spectrum. The proof is based on a
convergence analysis of an oracle version of vector approximate message-passing
(oracle-VAMP) and on the properties of its state evolution equations. Our
method leverages on and highlights the link between vector approximate
message-passing, Douglas-Rachford splitting and proximal descent algorithms,
extending previous results obtained with i.i.d. matrices for a large class of
problems. We illustrate our results on some concrete examples and show that
even though they are asymptotic, our predictions agree remarkably well with
numerics even for very moderate sizes.
Related papers
- Scaling Laws in Linear Regression: Compute, Parameters, and Data [86.48154162485712]
We study the theory of scaling laws in an infinite dimensional linear regression setup.
We show that the reducible part of the test error is $Theta(-(a-1) + N-(a-1)/a)$.
Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation.
arXiv Detail & Related papers (2024-06-12T17:53:29Z) - Entropic covariance models [0.7614628596146602]
We present a general framework for linear restrictions on different transformations of the covariance matrix.
Our proposed estimation method solves a convex problem and yields an $M$-estimator.
arXiv Detail & Related papers (2023-06-06T11:25:05Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - Dimension free ridge regression [10.434481202633458]
We revisit ridge regression on i.i.d. data in terms of the bias and variance of ridge regression in terms of the bias and variance of an equivalent' sequence model.
As a new application, we obtain a completely explicit and sharp characterization of ridge regression for Hilbert covariates with regularly varying spectrum.
arXiv Detail & Related papers (2022-10-16T16:01:05Z) - On the well-spread property and its relation to linear regression [4.619541348328937]
We show that consistent recovery of the parameter vector in a robust linear regression model is information-theoretically impossible.
We show that it is possible to efficiently certify whether a given $n$-by-$d$ matrix is well-spread if the number of observations is quadratic in the ambient dimension.
arXiv Detail & Related papers (2022-06-16T11:17:44Z) - When Random Tensors meet Random Matrices [50.568841545067144]
This paper studies asymmetric order-$d$ spiked tensor models with Gaussian noise.
We show that the analysis of the considered model boils down to the analysis of an equivalent spiked symmetric textitblock-wise random matrix.
arXiv Detail & Related papers (2021-12-23T04:05:01Z) - Near optimal sample complexity for matrix and tensor normal models via
geodesic convexity [5.191641077435773]
We show nonasymptotic bounds for the error achieved by the maximum likelihood estimator (MLE) in several natural metrics.
In the same regimes as our sample complexity bounds, we show that an iterative procedure to compute the MLE known as the flip-flop algorithm converges linearly with high probability.
arXiv Detail & Related papers (2021-10-14T17:47:00Z) - Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples.
We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z) - Information-Theoretic Limits for the Matrix Tensor Product [8.206394018475708]
This paper studies a high-dimensional inference problem involving the matrix tensor product of random matrices.
On the technical side, this paper introduces some new techniques for the analysis of high-dimensional matrix-preserving signals.
arXiv Detail & Related papers (2020-05-22T17:03:48Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z) - Optimal Iterative Sketching with the Subsampled Randomized Hadamard
Transform [64.90148466525754]
We study the performance of iterative sketching for least-squares problems.
We show that the convergence rate for Haar and randomized Hadamard matrices are identical, andally improve upon random projections.
These techniques may be applied to other algorithms that employ randomized dimension reduction.
arXiv Detail & Related papers (2020-02-03T16:17:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.