Related papers: Reliable Abstention under Adversarial Injections: Tight Lower Bounds and New Upper Bounds

Reliable Abstention under Adversarial Injections: Tight Lower Bounds and New Upper Bounds

URL: http://arxiv.org/abs/2602.20111v1
Date: Mon, 23 Feb 2026 18:30:48 GMT
Title: Reliable Abstention under Adversarial Injections: Tight Lower Bounds and New Upper Bounds
Authors: Ezra Edelman, Surbhi Goel,
Abstract summary: We study online learning in the adversarial injection model introduced by [Goel et al. 2017].<n>We prove a matching $(sqrtT)$ lower bound for VC dimension $1$, establishing a sharp separation between the two information regimes.<n>We introduce a potential-based framework driven by emphrobust witnesses, small subsets of labeled examples that certify predictions remaining resilient to adversarial contamination.
Score: 16.636823007179082
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study online learning in the adversarial injection model introduced by [Goel et al. 2017], where a stream of labeled examples is predominantly drawn i.i.d.\ from an unknown distribution $\mathcal{D}$, but may be interspersed with adversarially chosen instances without the learner knowing which rounds are adversarial. Crucially, labels are always consistent with a fixed target concept (the clean-label setting). The learner is additionally allowed to abstain from predicting, and the total error counts the mistakes whenever the learner decides to predict and incorrect abstentions when it abstains on i.i.d.\ rounds. Perhaps surprisingly, prior work shows that oracle access to the underlying distribution yields $O(d^2 \log T)$ combined error for VC dimension $d$, while distribution-agnostic algorithms achieve only $\tilde{O}(\sqrt{T})$ for restricted classes, leaving open whether this gap is fundamental. We resolve this question by proving a matching $Ω(\sqrt{T})$ lower bound for VC dimension $1$, establishing a sharp separation between the two information regimes. On the algorithmic side, we introduce a potential-based framework driven by \emph{robust witnesses}, small subsets of labeled examples that certify predictions while remaining resilient to adversarial contamination. We instantiate this framework using two combinatorial dimensions: (1) \emph{inference dimension}, yielding combined error $\tilde{O}(T^{1-1/k})$ for classes of inference dimension $k$, and (2) \emph{certificate dimension}, a new relaxation we introduce. As an application, we show that halfspaces in $\mathbb{R}^2$ have certificate dimension $3$, obtaining the first distribution-agnostic bound of $\tilde{O}(T^{2/3})$ for this class. This is notable since [Blum et al. 2021] showed halfspaces are not robustly learnable under clean-label attacks without abstention.

Related papers

Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct. Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$PAC version of the problem.
arXiv Detail & Related papers (2024-06-18T08:54:04Z)
Learning Intersections of Halfspaces with Distribution Shift: Improved Algorithms and SQ Lower Bounds [9.036777309376697]
Klivans, Stavropoulos, and Vasilyan initiated the study of testable learning with distribution shift. Their model deviates from all prior work in that no assumptions are made on $mathcalD'$.
arXiv Detail & Related papers (2024-04-02T23:34:39Z)
Semidefinite programming relaxations and debiasing for MAXCUT-based clustering [1.9761774213809036]
We consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of 2 sub-gaussian distributions in $mathbbRp$.<n>We use semidefinite programming relaxations of an integer quadratic program that is formulated as finding the maximum cut on a graph.
arXiv Detail & Related papers (2024-01-16T03:14:24Z)
Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs [72.40181882916089]
We show that our algorithm achieves an $tildeObig((d+log (|mathcalS|2 |mathcalA|))sqrtKbig)$ regret with full-information feedback, where $d$ is the dimension of a known feature mapping linearly parametrizing the unknown transition kernel of the MDP, $K$ is the number of episodes, $|mathcalS|$ and $|mathcalA|$ are the cardinalities of the state and action spaces
arXiv Detail & Related papers (2023-05-15T05:37:32Z)
Online Learning with Adversaries: A Differential-Inclusion Analysis [52.43460995467893]
We introduce an observation-matrix-based framework for fully asynchronous online Federated Learning with adversaries. Our main result is that the proposed algorithm almost surely converges to the desired mean $mu.$ We derive this convergence using a novel differential-inclusion-based two-timescale analysis.
arXiv Detail & Related papers (2023-04-04T04:32:29Z)
Overparametrized linear dimensionality reductions: From projection pursuit to two-layer neural networks [8.74634652691576]
We consider a cloud of $n data points in $mathbbRd$, consider all projections onto $m$-dimensional subspaces of $mathbbRd$ and, for each such projection, the empirical distribution of the projected points.<n>What does this collection of probability distributions look like when $n,d$ grow large?<n>We prove sharp bounds in terms of Kullback-Leibler divergence and R'enyi information dimension.
arXiv Detail & Related papers (2022-06-14T00:07:33Z)
Adversarially Robust Learning with Tolerance [8.658596218544774]
We study the problem of tolerant adversarial PAC learning with respect to metric perturbation sets. We show that a variant of the natural perturb-and-smooth algorithm PAC learns any hypothesis class $mathcalH$ with VC dimension $v$ in the $gamma$-tolerant adversarial setting. We additionally propose an alternative learning method which yields sample bounds with only linear dependence on the doubling dimension.
arXiv Detail & Related papers (2022-03-02T03:50:16Z)
Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals. Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z)
Provable Robustness of Adversarial Training for Learning Halfspaces with Noise [95.84614821570283]
We analyze the properties of adversarial learning adversarially robust halfspaces in the presence of label noise. To the best of our knowledge, this is the first work to show that adversarial training prov yields classifiers in noise.
arXiv Detail & Related papers (2021-04-19T16:35:38Z)
Hardness of Learning Halfspaces with Massart Noise [56.98280399449707]
We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise. We show that there an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a SQ algorithm.
arXiv Detail & Related papers (2020-12-17T16:43:11Z)
Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals [49.60752558064027]
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals. Our lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
arXiv Detail & Related papers (2020-06-29T17:10:10Z)
Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors. We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.