Computational-Statistical Gaps in Gaussian Single-Index Models
- URL: http://arxiv.org/abs/2403.05529v2
- Date: Tue, 12 Mar 2024 22:27:02 GMT
- Title: Computational-Statistical Gaps in Gaussian Single-Index Models
- Authors: Alex Damian, Loucas Pillaud-Vivien, Jason D. Lee, Joan Bruna
- Abstract summary: Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
- Score: 77.1473134227844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Single-Index Models are high-dimensional regression problems with planted
structure, whereby labels depend on an unknown one-dimensional projection of
the input via a generic, non-linear, and potentially non-deterministic
transformation. As such, they encompass a broad class of statistical inference
tasks, and provide a rich template to study statistical and computational
trade-offs in the high-dimensional regime.
While the information-theoretic sample complexity to recover the hidden
direction is linear in the dimension $d$, we show that computationally
efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree
Polynomial (LDP) framework, necessarily require $\Omega(d^{k^\star/2})$
samples, where $k^\star$ is a "generative" exponent associated with the model
that we explicitly characterize. Moreover, we show that this sample complexity
is also sufficient, by establishing matching upper bounds using a partial-trace
algorithm. Therefore, our results provide evidence of a sharp
computational-to-statistical gap (under both the SQ and LDP class) whenever
$k^\star>2$. To complete the study, we provide examples of smooth and Lipschitz
deterministic target functions with arbitrarily large generative exponents
$k^\star$.
Related papers
- Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Sample-Efficient Linear Representation Learning from Non-IID Non-Isotropic Data [4.971690889257356]
We introduce an adaptation of the alternating minimization-descent scheme proposed by Collins and Nayer and Vaswani.
We show that vanilla alternating-minimization descent fails catastrophically even for iid, but mildly non-isotropic data.
Our analysis unifies and generalizes prior work, and provides a flexible framework for a wider range of applications.
arXiv Detail & Related papers (2023-08-08T17:56:20Z) - Subspace clustering in high-dimensions: Phase transitions \&
Statistical-to-Computational gap [24.073221004661427]
A simple model to study subspace clustering is the high-dimensional $k$-Gaussian mixture model.
We provide an exact characterization of the statistically optimal reconstruction error in this model in the high-dimensional regime with extensive sparsity.
arXiv Detail & Related papers (2022-05-26T17:47:35Z) - Improving Computational Complexity in Statistical Models with
Second-Order Information [32.64382185990981]
We study the normalized gradient descent (NormGD) algorithm for solving parameter estimation in parametric statistical models.
We demonstrate that the NormGD algorithm achieves the optimal overall computational complexity $mathcalO(n)$ to reach the final statistical radius.
This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm.
arXiv Detail & Related papers (2022-02-09T01:32:50Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Estimating Stochastic Linear Combination of Non-linear Regressions
Efficiently and Scalably [23.372021234032363]
We show that when the sub-sample sizes are large then the estimation errors will be sacrificed by too much.
To the best of our knowledge, this is the first work that and guarantees for the lineartext+Stochasticity model.
arXiv Detail & Related papers (2020-10-19T07:15:38Z) - Breaking the Sample Size Barrier in Model-Based Reinforcement Learning
with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator)
We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$.
We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates.
The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature.
It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z) - A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data.
Under a class of statistical models, we provide an exact analysis of the universality error of boosting.
We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.