Robust Linear Predictions: Analyses of Uniform Concentration, Fast Rates
and Model Misspecification
- URL: http://arxiv.org/abs/2201.01973v1
- Date: Thu, 6 Jan 2022 08:51:08 GMT
- Title: Robust Linear Predictions: Analyses of Uniform Concentration, Fast Rates
and Model Misspecification
- Authors: Saptarshi Chakraborty, Debolina Paul and Swagatam Das
- Abstract summary: We offer a unified framework that includes a broad variety of linear prediction problems on a Hilbert space.
We show that for misspecification level $epsilon$, these estimators achieve an error rate of $O(maxleft|mathcalO|1/2n-1/2, |mathcalI|1/2n-1 right+epsilon)$, matching the best-known rates in literature.
- Score: 16.0817847880416
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The problem of linear predictions has been extensively studied for the past
century under pretty generalized frameworks. Recent advances in the robust
statistics literature allow us to analyze robust versions of classical linear
models through the prism of Median of Means (MoM). Combining these approaches
in a piecemeal way might lead to ad-hoc procedures, and the restricted
theoretical conclusions that underpin each individual contribution may no
longer be valid. To meet these challenges coherently, in this study, we offer a
unified robust framework that includes a broad variety of linear prediction
problems on a Hilbert space, coupled with a generic class of loss functions.
Notably, we do not require any assumptions on the distribution of the outlying
data points ($\mathcal{O}$) nor the compactness of the support of the inlying
ones ($\mathcal{I}$). Under mild conditions on the dual norm, we show that for
misspecification level $\epsilon$, these estimators achieve an error rate of
$O(\max\left\{|\mathcal{O}|^{1/2}n^{-1/2}, |\mathcal{I}|^{1/2}n^{-1}
\right\}+\epsilon)$, matching the best-known rates in literature. This rate is
slightly slower than the classical rates of $O(n^{-1/2})$, indicating that we
need to pay a price in terms of error rates to obtain robust estimates.
Additionally, we show that this rate can be improved to achieve so-called
``fast rates" under additional assumptions.
Related papers
- Analysis of Bootstrap and Subsampling in High-dimensional Regularized Regression [29.57766164934947]
We investigate popular resampling methods for estimating the uncertainty of statistical models.
We provide a tight description of the biases and variances estimated by these methods in the context of generalized linear models.
arXiv Detail & Related papers (2024-02-21T08:50:33Z) - Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density.
We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order.
We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z) - A Primal-Dual Approach to Solving Variational Inequalities with General Constraints [54.62996442406718]
Yang et al. (2023) recently showed how to use first-order gradient methods to solve general variational inequalities.
We prove the convergence of this method and show that the gap function of the last iterate of the method decreases at a rate of $O(frac1sqrtK)$ when the operator is $L$-Lipschitz and monotone.
arXiv Detail & Related papers (2022-10-27T17:59:09Z) - High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad
Stepsize [55.0090961425708]
We propose a new, simplified high probability analysis of AdaGrad for smooth, non- probability problems.
We present our analysis in a modular way and obtain a complementary $mathcal O (1 / TT)$ convergence rate in the deterministic setting.
To the best of our knowledge, this is the first high probability for AdaGrad with a truly adaptive scheme, i.e., completely oblivious to the knowledge of smoothness.
arXiv Detail & Related papers (2022-04-06T13:50:33Z) - Distributed Sparse Regression via Penalization [5.990069843501885]
We study linear regression over a network of agents, modeled as an undirected graph (with no centralized node)
The estimation problem is formulated as the minimization of the sum of the local LASSO loss functions plus a quadratic penalty of the consensus constraint.
We show that the proximal-gradient algorithm applied to the penalized problem converges linearly up to a tolerance of the order of the centralized statistical error.
arXiv Detail & Related papers (2021-11-12T01:51:50Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Convergence Rates of Stochastic Gradient Descent under Infinite Noise
Variance [14.06947898164194]
Heavy tails emerge in gradient descent (SGD) in various scenarios.
We provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance.
Our results indicate that even under heavy-tailed noise with infinite variance, SGD can converge to the global optimum.
arXiv Detail & Related papers (2021-02-20T13:45:11Z) - Super fast rates in structured prediction [88.99819200562784]
We show how we can leverage the fact that discrete problems are essentially predicting a discrete output when continuous problems are predicting a continuous value.
We first illustrate it for predictors based on nearest neighbors, generalizing rates known for binary classification to any discrete problem within the framework of structured prediction.
We then consider kernel ridge regression where we improve known rates in $n-1/4$ to arbitrarily fast rates, depending on a parameter characterizing the hardness of the problem.
arXiv Detail & Related papers (2021-02-01T10:50:04Z) - Suboptimality of Constrained Least Squares and Improvements via
Non-Linear Predictors [3.5788754401889014]
We study the problem of predicting as well as the best linear predictor in a bounded Euclidean ball with respect to the squared loss.
We discuss additional distributional assumptions sufficient to guarantee an $O(d/n)$ excess risk rate for the least squares estimator.
arXiv Detail & Related papers (2020-09-19T21:39:46Z) - Sharp Statistical Guarantees for Adversarially Robust Gaussian
Classification [54.22421582955454]
We provide the first result of the optimal minimax guarantees for the excess risk for adversarially robust classification.
Results are stated in terms of the Adversarial Signal-to-Noise Ratio (AdvSNR), which generalizes a similar notion for standard linear classification to the adversarial setting.
arXiv Detail & Related papers (2020-06-29T21:06:52Z) - A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data.
Under a class of statistical models, we provide an exact analysis of the universality error of boosting.
We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.