Stability and Risk Bounds of Iterative Hard Thresholding
- URL: http://arxiv.org/abs/2203.09413v1
- Date: Thu, 17 Mar 2022 16:12:56 GMT
- Title: Stability and Risk Bounds of Iterative Hard Thresholding
- Authors: Xiao-Tong Yuan and Ping Li
- Abstract summary: We introduce a novel sparse generalization theory for IHT under the notion of algorithmic stability.
We show that IHT with sparsity level $k$ enjoys an $mathcaltilde O(n-1/2sqrtlog(n)log(p))$ rate of convergence in sparse excess risk.
Preliminary numerical evidence is provided to confirm our theoretical predictions.
- Score: 41.082982732100696
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we analyze the generalization performance of the Iterative
Hard Thresholding (IHT) algorithm widely used for sparse recovery problems. The
parameter estimation and sparsity recovery consistency of IHT has long been
known in compressed sensing. From the perspective of statistical learning,
another fundamental question is how well the IHT estimation would predict on
unseen data. This paper makes progress towards answering this open question by
introducing a novel sparse generalization theory for IHT under the notion of
algorithmic stability. Our theory reveals that: 1) under natural conditions on
the empirical risk function over $n$ samples of dimension $p$, IHT with
sparsity level $k$ enjoys an $\mathcal{\tilde
O}(n^{-1/2}\sqrt{k\log(n)\log(p)})$ rate of convergence in sparse excess risk;
2) a tighter $\mathcal{\tilde O}(n^{-1/2}\sqrt{\log(n)})$ bound can be
established by imposing an additional iteration stability condition on a
hypothetical IHT procedure invoked to the population risk; and 3) a fast rate
of order $\mathcal{\tilde O}\left(n^{-1}k(\log^3(n)+\log(p))\right)$ can be
derived for strongly convex risk function under proper strong-signal
conditions. The results have been substantialized to sparse linear regression
and sparse logistic regression models to demonstrate the applicability of our
theory. Preliminary numerical evidence is provided to confirm our theoretical
predictions.
Related papers
- Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - On the Stability and Generalization of Triplet Learning [55.75784102837832]
Triplet learning, i.e. learning from triplet data, has attracted much attention in computer vision tasks.
This paper investigates the generalization guarantees of triplet learning by leveraging the stability analysis.
arXiv Detail & Related papers (2023-02-20T07:32:50Z) - Retire: Robust Expectile Regression in High Dimensions [3.9391041278203978]
Penalized quantile and expectile regression methods offer useful tools to detect heteroscedasticity in high-dimensional data.
We propose and study (penalized) robust expectile regression (retire)
We show that the proposed procedure can be efficiently solved by a semismooth Newton coordinate descent algorithm.
arXiv Detail & Related papers (2022-12-11T18:03:12Z) - A Conditional Randomization Test for Sparse Logistic Regression in
High-Dimension [36.00360315353985]
emphCRT-logit is an algorithm that combines a variable-distillation step and a decorrelation step.
We provide a theoretical analysis of this procedure, and demonstrate its effectiveness on simulations, along with experiments on large-scale brain-imaging and genomics datasets.
arXiv Detail & Related papers (2022-05-29T09:37:16Z) - Non-Asymptotic Guarantees for Robust Statistical Learning under
$(1+\varepsilon)$-th Moment Assumption [0.716879432974126]
This paper proposes a log-truncated M-mestiator for a large family of statistical regressions.
We show the superiority of log-truncated estimations over standard estimations.
arXiv Detail & Related papers (2022-01-10T06:22:30Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Understanding the Under-Coverage Bias in Uncertainty Estimation [58.03725169462616]
quantile regression tends to emphunder-cover than the desired coverage level in reality.
We prove that quantile regression suffers from an inherent under-coverage bias.
Our theory reveals that this under-coverage bias stems from a certain high-dimensional parameter estimation error.
arXiv Detail & Related papers (2021-06-10T06:11:55Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Convergence Rates of Stochastic Gradient Descent under Infinite Noise
Variance [14.06947898164194]
Heavy tails emerge in gradient descent (SGD) in various scenarios.
We provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance.
Our results indicate that even under heavy-tailed noise with infinite variance, SGD can converge to the global optimum.
arXiv Detail & Related papers (2021-02-20T13:45:11Z) - Estimation in Tensor Ising Models [5.161531917413708]
We consider the problem of estimating the natural parameter of the $p$-tensor Ising model given a single sample from the distribution on $N$ nodes.
In particular, we show the $sqrt N$-consistency of the MPL estimate in the $p$-spin Sherrington-Kirkpatrick (SK) model.
We derive the precise fluctuations of the MPL estimate in the special case of the $p$-tensor Curie-Weiss model.
arXiv Detail & Related papers (2020-08-29T00:06:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.