Mixture model for designs in high dimensional regression and the LASSO
- URL: http://arxiv.org/abs/1210.4762v3
- Date: Tue, 19 Dec 2023 20:54:03 GMT
- Title: Mixture model for designs in high dimensional regression and the LASSO
- Authors: Mohamed Ibrahim Assoweh, Emmanuel Caron and St\'ephane Chr\'etien
- Abstract summary: The LASSO is a technique for variable selection in the regression model bean y & = & Xbeta + z, eean.
This paper proposes a mixture model for the design matrix which is able to capture in a natural way the potentially clustered nature of the columns.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The LASSO is a recent technique for variable selection in the regression
model \bean y & = & X\beta + z, \eean where $X\in \R^{n\times p}$ and $z$ is a
centered gaussian i.i.d. noise vector $\mathcal N(0,\sigma^2I)$. The LASSO has
been proved to achieve remarkable properties such as exact support recovery of
sparse vectors when the columns are sufficently incoherent and low prediction
error under even less stringent conditions. However, many matrices do not
satisfy small coherence in practical applications and the LASSO estimator may
thus suffer from what is known as the slow rate regime.
The goal of the present paper is to study the LASSO from a slightly different
perspective by proposing a mixture model for the design matrix which is able to
capture in a natural way the potentially clustered nature of the columns in
many practical situations. In this model, the columns of the design matrix are
drawn from a Gaussian mixture model. Instead of requiring incoherence for the
design matrix $X$, we only require incoherence of the much smaller matrix of
the mixture's centers.
Our main result states that $X\beta$ can be estimated with the same precision
as for incoherent designs except for a correction term depending on the maximal
variance in the mixture model.
Related papers
- Scaling Laws in Linear Regression: Compute, Parameters, and Data [86.48154162485712]
We study the theory of scaling laws in an infinite dimensional linear regression setup.
We show that the reducible part of the test error is $Theta(-(a-1) + N-(a-1)/a)$.
Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation.
arXiv Detail & Related papers (2024-06-12T17:53:29Z) - Entropic covariance models [0.7614628596146602]
We present a general framework for linear restrictions on different transformations of the covariance matrix.
Our proposed estimation method solves a convex problem and yields an $M$-estimator.
arXiv Detail & Related papers (2023-06-06T11:25:05Z) - Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood
Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions.
Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation.
In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - Dimension free ridge regression [10.434481202633458]
We revisit ridge regression on i.i.d. data in terms of the bias and variance of ridge regression in terms of the bias and variance of an equivalent' sequence model.
As a new application, we obtain a completely explicit and sharp characterization of ridge regression for Hilbert covariates with regularly varying spectrum.
arXiv Detail & Related papers (2022-10-16T16:01:05Z) - On the well-spread property and its relation to linear regression [4.619541348328937]
We show that consistent recovery of the parameter vector in a robust linear regression model is information-theoretically impossible.
We show that it is possible to efficiently certify whether a given $n$-by-$d$ matrix is well-spread if the number of observations is quadratic in the ambient dimension.
arXiv Detail & Related papers (2022-06-16T11:17:44Z) - Hybrid Model-based / Data-driven Graph Transform for Image Coding [54.31406300524195]
We present a hybrid model-based / data-driven approach to encode an intra-prediction residual block.
The first $K$ eigenvectors of a transform matrix are derived from a statistical model, e.g., the asymmetric discrete sine transform (ADST) for stability.
Using WebP as a baseline image, experimental results show that our hybrid graph transform achieved better energy compaction than default discrete cosine transform (DCT) and better stability than KLT.
arXiv Detail & Related papers (2022-03-02T15:36:44Z) - Statistical limits of dictionary learning: random matrix theory and the
spectral replica method [28.54289139061295]
We consider increasingly complex models of matrix denoising and dictionary learning in the Bayes-optimal setting.
We introduce a novel combination of the replica method from statistical mechanics together with random matrix theory, coined spectral replica method.
arXiv Detail & Related papers (2021-09-14T12:02:32Z) - Non-PSD Matrix Sketching with Applications to Regression and
Optimization [56.730993511802865]
We present dimensionality reduction methods for non-PSD and square-roots" matrices.
We show how these techniques can be used for multiple downstream tasks.
arXiv Detail & Related papers (2021-06-16T04:07:48Z) - Estimating Stochastic Linear Combination of Non-linear Regressions
Efficiently and Scalably [23.372021234032363]
We show that when the sub-sample sizes are large then the estimation errors will be sacrificed by too much.
To the best of our knowledge, this is the first work that and guarantees for the lineartext+Stochasticity model.
arXiv Detail & Related papers (2020-10-19T07:15:38Z) - Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples.
We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.