Risk reversal for least squares estimators under nested convex constraints
- URL: http://arxiv.org/abs/2601.16041v1
- Date: Thu, 22 Jan 2026 15:18:36 GMT
- Title: Risk reversal for least squares estimators under nested convex constraints
- Authors: Omar Al-Ghattas,
- Abstract summary: In constrained optimization, one naturally expects that imposing a stricter feasible set does not increase the statistical risk of an estimator defined by projection onto that set.<n>We show that this intuition can fail even in canonical settings.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In constrained stochastic optimization, one naturally expects that imposing a stricter feasible set does not increase the statistical risk of an estimator defined by projection onto that set. In this paper, we show that this intuition can fail even in canonical settings. We study the Gaussian sequence model, a deliberately austere test best, where for a compact, convex set $Θ\subset \mathbb{R}^d$ one observes \[ Y = θ^\star + σZ, \qquad Z \sim N(0, I_d), \] and seeks to estimate an unknown parameter $θ^\star \in Θ$. The natural estimator is the least squares estimator (LSE), which coincides with the Euclidean projection of $Y$ onto $Θ$. We construct an explicit example exhibiting \emph{risk reversal}: for sufficiently large noise, there exist nested compact convex sets $Θ_S \subset Θ_L$ and a parameter $θ^\star \in Θ_S$ such that the LSE constrained to $Θ_S$ has strictly larger risk than the LSE constrained to $Θ_L$. We further show that this phenomenon can persist at the level of worst-case risk, with the supremum risk over the smaller constraint set exceeding that over the larger one. We clarify this behavior by contrasting noise regimes. In the vanishing-noise limit, the risk admits a first-order expansion governed by the statistical dimension of the tangent cone at $θ^\star$, and tighter constraints uniformly reduce risk. In contrast, in the diverging-noise regime, the risk is determined by global geometric interactions between the constraint set and random noise directions. Here, the embedding of $Θ_S$ within $Θ_L$ can reverse the risk ordering. These results reveal a previously unrecognized failure mode of projection-based estimators: in sufficiently noisy settings, tightening a constraint can paradoxically degrade statistical performance.
Related papers
- All ERMs Can Fail in Stochastic Convex Optimization Lower Bounds in Linear Dimension [14.982451024975733]
We show that there exists an instance in which the sample size is linear in the dimension, learning is possible, but the Empirical Risk Minimizer is likely to be unique and to overfit.<n>We provide a novel generalization lower bound of $left(sqrtT/m1.5right)$ for Gradient Descent, where $$ is the learning rate, $T$ is the horizon and $m$ is the sample size.
arXiv Detail & Related papers (2026-02-09T07:33:01Z) - Super-fast rates of convergence for Neural Networks Classifiers under the Hard Margin Condition [9.993044620455338]
We show that DNNs which minimize the empirical risk with square loss surrogate and $ell_p$ penalty can achieve finite-sample excess risk of order $mathcalOleft(n-alpharight)$ for arbitrarily large $alpha>0$ under the hard-margin condition.<n>The proof relies on a novel decomposition of the excess risk which might be of independent interest.
arXiv Detail & Related papers (2025-05-13T06:26:04Z) - First-Order Regret in Reinforcement Learning with Linear Function
Approximation: A Robust Estimation Approach [57.570201404222935]
We show that it is possible to obtain regret scaling as $mathcalO(sqrtV_1star K)$ in reinforcement learning with large state spaces.
We demonstrate that existing techniques based on at least squares estimation are insufficient to obtain this result.
arXiv Detail & Related papers (2021-12-07T00:29:57Z) - Stochastic Shortest Path: Minimax, Parameter-Free and Towards
Horizon-Free Regret [144.6358229217845]
We study the problem of learning in the shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state.
We design a novel model-based algorithm EB-SSP that carefully skews the empirical transitions and perturbs the empirical costs with an exploration bonus.
We prove that EB-SSP achieves the minimax regret rate $widetildeO(B_star sqrtS A K)$, where $K$ is the number of episodes, $S$ is the number of states, $A$
arXiv Detail & Related papers (2021-04-22T17:20:48Z) - Stability and Deviation Optimal Risk Bounds with Convergence Rate
$O(1/n)$ [4.1499725848998965]
We show a high probability excess risk bound with the rate $O(log n/n)$ for strongly convex and Lipschitz losses valid for emphany empirical risk minimization method.
We discuss how $O(log n/n)$ high probability excess risk bounds are possible for projected gradient descent in the case of strongly convex and Lipschitz losses without the usual smoothness assumption.
arXiv Detail & Related papers (2021-03-22T17:28:40Z) - Non-Euclidean Differentially Private Stochastic Convex Optimization [15.302167005107135]
We show that noisy gradient descent (SGD) algorithms attain the optimal excess risk in low-dimensional regimes.
Our work draws upon concepts from the geometry of normed spaces, such as the notions of regularity, uniform convexity, and uniform smoothness.
arXiv Detail & Related papers (2021-03-01T19:48:44Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z) - Sharp Statistical Guarantees for Adversarially Robust Gaussian
Classification [54.22421582955454]
We provide the first result of the optimal minimax guarantees for the excess risk for adversarially robust classification.
Results are stated in terms of the Adversarial Signal-to-Noise Ratio (AdvSNR), which generalizes a similar notion for standard linear classification to the adversarial setting.
arXiv Detail & Related papers (2020-06-29T21:06:52Z) - Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff
in Regret [115.85354306623368]
We study risk-sensitive reinforcement learning in episodic Markov decision processes with unknown transition kernels.
We propose two provably efficient model-free algorithms, Risk-Sensitive Value Iteration (RSVI) and Risk-Sensitive Q-learning (RSQ)
We prove that RSVI attains an $tildeObig(lambda(|beta| H2) cdot sqrtH3 S2AT big)$ regret, while RSQ attains an $tildeObig(lambda
arXiv Detail & Related papers (2020-06-22T19:28:26Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z) - Risk of the Least Squares Minimum Norm Estimator under the Spike
Covariance Model [0.0]
We study risk of the minimum norm linear least squares estimator in when the number of parameters $d$ depends on $n$, and $fracdn rightarrow infty$.
We show that in this setting risk of minimum norm least squares estimator vanishes in compare to risk of the null estimator.
arXiv Detail & Related papers (2019-12-31T16:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.