Suboptimality of Constrained Least Squares and Improvements via
Non-Linear Predictors
- URL: http://arxiv.org/abs/2009.09304v2
- Date: Sun, 7 Mar 2021 16:24:01 GMT
- Title: Suboptimality of Constrained Least Squares and Improvements via
Non-Linear Predictors
- Authors: Tomas Va\v{s}kevi\v{c}ius and Nikita Zhivotovskiy
- Abstract summary: We study the problem of predicting as well as the best linear predictor in a bounded Euclidean ball with respect to the squared loss.
We discuss additional distributional assumptions sufficient to guarantee an $O(d/n)$ excess risk rate for the least squares estimator.
- Score: 3.5788754401889014
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of predicting as well as the best linear predictor in a
bounded Euclidean ball with respect to the squared loss. When only boundedness
of the data generating distribution is assumed, we establish that the least
squares estimator constrained to a bounded Euclidean ball does not attain the
classical $O(d/n)$ excess risk rate, where $d$ is the dimension of the
covariates and $n$ is the number of samples. In particular, we construct a
bounded distribution such that the constrained least squares estimator incurs
an excess risk of order $\Omega(d^{3/2}/n)$ hence refuting a recent conjecture
of Ohad Shamir [JMLR 2015]. In contrast, we observe that non-linear predictors
can achieve the optimal rate $O(d/n)$ with no assumptions on the distribution
of the covariates. We discuss additional distributional assumptions sufficient
to guarantee an $O(d/n)$ excess risk rate for the least squares estimator.
Among them are certain moment equivalence assumptions often used in the robust
statistics literature. While such assumptions are central in the analysis of
unbounded and heavy-tailed settings, our work indicates that in some cases,
they also rule out unfavorable bounded distributions.
Related papers
- Sparse PCA with Oracle Property [115.72363972222622]
We propose a family of estimators based on the semidefinite relaxation of sparse PCA with novel regularizations.
We prove that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA.
arXiv Detail & Related papers (2023-12-28T02:52:54Z) - $L^1$ Estimation: On the Optimality of Linear Estimators [64.76492306585168]
This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian.
In particular, it is demonstrated that if the conditional distribution $P_X|Y=y$ is symmetric for all $y$, then $X$ must follow a Gaussian distribution.
arXiv Detail & Related papers (2023-09-17T01:45:13Z) - Estimating Optimal Policy Value in General Linear Contextual Bandits [50.008542459050155]
In many bandit problems, the maximal reward achievable by a policy is often unknown in advance.
We consider the problem of estimating the optimal policy value in the sublinear data regime before the optimal policy is even learnable.
We present a more practical, computationally efficient algorithm that estimates a problem-dependent upper bound on $V*$.
arXiv Detail & Related papers (2023-02-19T01:09:24Z) - New Lower Bounds for Private Estimation and a Generalized Fingerprinting
Lemma [10.176795938619417]
We prove new lower bounds for statistical estimation tasks under the constraint of $(varepsilon, delta)$-differential privacy.
We show that estimating the Frobenius norm requires $Omega(d2)$ samples, and in spectral norm requires $Omega(d3/2)$ samples, both matching upper bounds up to logarithmic factors.
arXiv Detail & Related papers (2022-05-17T17:55:10Z) - Robust Linear Predictions: Analyses of Uniform Concentration, Fast Rates
and Model Misspecification [16.0817847880416]
We offer a unified framework that includes a broad variety of linear prediction problems on a Hilbert space.
We show that for misspecification level $epsilon$, these estimators achieve an error rate of $O(maxleft|mathcalO|1/2n-1/2, |mathcalI|1/2n-1 right+epsilon)$, matching the best-known rates in literature.
arXiv Detail & Related papers (2022-01-06T08:51:08Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Distribution-Free Robust Linear Regression [5.532477732693]
We study random design linear regression with no assumptions on the distribution of the covariates.
We construct a non-linear estimator achieving excess risk of order $d/n$ with the optimal sub-exponential tail.
We prove an optimal version of the classical bound for the truncated least squares estimator due to Gy"orfi, Kohler, Krzyzak, and Walk.
arXiv Detail & Related papers (2021-02-25T15:10:41Z) - Sharp Statistical Guarantees for Adversarially Robust Gaussian
Classification [54.22421582955454]
We provide the first result of the optimal minimax guarantees for the excess risk for adversarially robust classification.
Results are stated in terms of the Adversarial Signal-to-Noise Ratio (AdvSNR), which generalizes a similar notion for standard linear classification to the adversarial setting.
arXiv Detail & Related papers (2020-06-29T21:06:52Z) - Robust Linear Regression: Optimal Rates in Polynomial Time [11.646151402884215]
We obtain robust and computationally efficient estimators for learning several linear models.
We identify an analytic condition that serves as a relaxation of independence of random variables.
Our central technical contribution is to algorithmically exploit independence of random variables in the "sum-of-squares" framework.
arXiv Detail & Related papers (2020-06-29T17:22:16Z) - Distributionally Robust Bayesian Quadrature Optimization [60.383252534861136]
We study BQO under distributional uncertainty in which the underlying probability distribution is unknown except for a limited set of its i.i.d. samples.
A standard BQO approach maximizes the Monte Carlo estimate of the true expected objective given the fixed sample set.
We propose a novel posterior sampling based algorithm, namely distributionally robust BQO (DRBQO) for this purpose.
arXiv Detail & Related papers (2020-01-19T12:00:33Z) - Risk of the Least Squares Minimum Norm Estimator under the Spike
Covariance Model [0.0]
We study risk of the minimum norm linear least squares estimator in when the number of parameters $d$ depends on $n$, and $fracdn rightarrow infty$.
We show that in this setting risk of minimum norm least squares estimator vanishes in compare to risk of the null estimator.
arXiv Detail & Related papers (2019-12-31T16:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.