Statistical Query Lower Bounds for Smoothed Agnostic Learning
- URL: http://arxiv.org/abs/2602.21191v1
- Date: Tue, 24 Feb 2026 18:46:46 GMT
- Title: Statistical Query Lower Bounds for Smoothed Agnostic Learning
- Authors: Ilias Diakonikolas, Daniel M. Kane,
- Abstract summary: We study the complexity of smoothed learning, recently introduced byciteCKKMS24, in which the learner competes with the best in a target class under slight perturbations of prototypical inputs.<n>Specifically, we focus on agnostically learning halfspaces under subgaussian distributions in the smoothed model.
- Score: 42.71001191804269
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the complexity of smoothed agnostic learning, recently introduced by~\cite{CKKMS24}, in which the learner competes with the best classifier in a target class under slight Gaussian perturbations of the inputs. Specifically, we focus on the prototypical task of agnostically learning halfspaces under subgaussian distributions in the smoothed model. The best known upper bound for this problem relies on $L_1$-polynomial regression and has complexity $d^{\tilde{O}(1/σ^2) \log(1/ε)}$, where $σ$ is the smoothing parameter and $ε$ is the excess error. Our main result is a Statistical Query (SQ) lower bound providing formal evidence that this upper bound is close to best possible. In more detail, we show that (even for Gaussian marginals) any SQ algorithm for smoothed agnostic learning of halfspaces requires complexity $d^{Ω(1/σ^{2}+\log(1/ε))}$. This is the first non-trivial lower bound on the complexity of this task and nearly matches the known upper bound. Roughly speaking, we show that applying $L_1$-polynomial regression to a smoothed version of the function is essentially best possible. Our techniques involve finding a moment-matching hard distribution by way of linear programming duality. This dual program corresponds exactly to finding a low-degree approximating polynomial to the smoothed version of the target function (which turns out to be the same condition required for the $L_1$-polynomial regression to work). Our explicit SQ lower bound then comes from proving lower bounds on this approximation degree for the class of halfspaces.
Related papers
- Robust Learning of Multi-index Models via Iterative Subspace Approximation [36.138661719725626]
We study the task of learning Multi-Index Models (MIMs) with label noise under the Gaussian distribution.<n>We focus on well-behaved MIMs with finite ranges that satisfy certain regularity properties.<n>We show that in the presence of random classification noise, the complexity of our algorithm scales agnosticly with $1/epsilon$.
arXiv Detail & Related papers (2025-02-13T17:37:42Z) - Sum-of-squares lower bounds for Non-Gaussian Component Analysis [33.80749804695003]
Non-Gaussian Component Analysis (NGCA) is the statistical task of finding a non-Gaussian direction in a high-dimensional dataset.
Here we study the complexity of NGCA in the Sum-of-Squares framework.
arXiv Detail & Related papers (2024-10-28T18:19:13Z) - Agnostically Learning Multi-index Models with Queries [54.290489524576756]
We study the power of query access for the task of agnostic learning under the Gaussian distribution.
We show that query access gives significant runtime improvements over random examples for agnostically learning MIMs.
arXiv Detail & Related papers (2023-12-27T15:50:47Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming [53.63469275932989]
We consider online statistical inference of constrained nonlinear optimization problems.<n>We apply the Sequential Quadratic Programming (StoSQP) method to solve these problems.
arXiv Detail & Related papers (2022-05-27T00:34:03Z) - Near-Optimal Statistical Query Lower Bounds for Agnostically Learning
Intersections of Halfspaces with Gaussian Marginals [10.06689891744466]
We consider the well-studied problem of learning intersections halfspaces under the Gaussian distribution in the challenging emphagnostic learning model.
We prove two variants of our lower bound, each of which combines ingredients from Diakonikolas et al. (2021) with (an extension of) a different earlier approach for SQ lower bounds for the Boolean setting.
arXiv Detail & Related papers (2022-02-10T15:34:10Z) - The Optimality of Polynomial Regression for Agnostic Learning under
Gaussian Marginals [47.81107898315438]
We develop a method for finding hard families of examples for a wide class of problems by using duality LP.
We show that the $L1$-regression is essentially best possible, and therefore that the computational difficulty of learning a concept class is closely related to the degree required to approximate any function from the class in $L1$-norm.
arXiv Detail & Related papers (2021-02-08T18:06:32Z) - Q-learning with Uniformly Bounded Variance: Large Discounting is Not a
Barrier to Fast Learning [7.6146285961466]
We introduce a new class of algorithms that have sample complexity uniformly bounded for all $gamma 1$.
We show that the covariance of the Q-learning algorithm with an optimized step-size sequence is a quadratic function of $1/(1-gamma)$; an expected, and essentially known result.
arXiv Detail & Related papers (2020-02-24T15:12:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.