Lookahead identification in adversarial bandits: accuracy and memory bounds
- URL: http://arxiv.org/abs/2603.00803v1
- Date: Sat, 28 Feb 2026 20:38:58 GMT
- Title: Lookahead identification in adversarial bandits: accuracy and memory bounds
- Authors: Nataly Brukhim, Nicolò Cesa-Bianchi, Carlo Ciliberto,
- Abstract summary: We study an identification problem in multi-armed bandits.<n>In each round a learner selects one of $K$ arms and observes its reward, with the goal of eventually identifying an arm that will perform best at a it future time.<n>Past performance may offer little information about the future, raising the question of whether meaningful identification is possible at all.
- Score: 24.731657695290227
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study an identification problem in multi-armed bandits. In each round a learner selects one of $K$ arms and observes its reward, with the goal of eventually identifying an arm that will perform best at a {\it future} time. In adversarial environments, however, past performance may offer little information about the future, raising the question of whether meaningful identification is possible at all. In this work, we introduce \emph{lookahead identification}, a task in which the goal of the learner is to select a future prediction window and commit in advance to an arm whose average reward over that window is within $\varepsilon$ of optimal. Our analysis characterizes both the achievable accuracy of lookahead identification and the memory resources required to obtain it. From an accuracy standpoint, for any horizon $T$ we give an algorithm achieving $\varepsilon = O\bigl(1/\sqrt{\log T}\bigr)$ over $Ω(\sqrt{T})$ prediction windows. This demonstrates that, perhaps surprisingly, identification is possible in adversarial settings, despite significant lack of information. We also prove a near-matching lower bound showing that $\varepsilon = Ω\bigl(1/\log T\bigr)$ is unavoidable. We then turn to investigate the role of memory in our problem, first proving that any algorithm achieving nontrivial accuracy requires $Ω(K)$ bits of memory. Under a natural \emph{local sparsity} condition, we show that the same accuracy guarantees can be achieved using only poly-logarithmic memory.
Related papers
- Optimal Multi-Objective Best Arm Identification with Fixed Confidence [62.36929749450298]
We consider a multi-armed bandit setting in which each arm yields an $M$-dimensional vector reward upon selection.<n>The end goal is to identify the best arm of em every objective in the shortest (expected) time subject to an upper bound on the probability of error.<n>We propose an algorithm that uses the novel idea of em surrogate proportions to sample the arms at each time step, eliminating the need to solve the max-min optimisation problem at each step.
arXiv Detail & Related papers (2025-01-23T12:28:09Z) - Fingerprinting Codes Meet Geometry: Improved Lower Bounds for Private Query Release and Adaptive Data Analysis [25.476062424924713]
We propose a general framework for proving fingerprinting type lower bounds, that allows us to tailor the technique to the geometry of the query set.<n>We show that any (sample- and population-)accurate algorithm for answering $Q$ arbitrary adaptive counting queries over a universe $mathcalX$ to accuracy $alpha$ needs $Omega(fracsqrt log|mathcalX| log (1/delta) log Qvarepsilonalpha2)$ samples, matching known upper bounds up to constants
arXiv Detail & Related papers (2024-12-18T23:11:07Z) - Almost Minimax Optimal Best Arm Identification in Piecewise Stationary Linear Bandits [55.957560311008926]
We propose a piecewise stationary linear bandit (PSLB) model where the quality of an arm is measured by its return averaged over all contexts.
PS$varepsilon$BAI$+$ is guaranteed to identify an $varepsilon$-optimal arm with probability $ge 1-delta$ and with a minimal number of samples.
arXiv Detail & Related papers (2024-10-10T06:15:42Z) - A/B Testing and Best-arm Identification for Linear Bandits with
Robustness to Non-stationarity [28.068960555415014]
We investigate the fixed-budget best-arm identification problem for linear bandits in a potentially non-stationary environment.
An algorithm will aim to correctly identify the best arm $x* := argmax_xinmathcalXxtopsum_t=1Ttheta_t$ with probability as high as possible.
arXiv Detail & Related papers (2023-07-27T19:03:36Z) - Online Learning with Adversaries: A Differential-Inclusion Analysis [52.43460995467893]
We introduce an observation-matrix-based framework for fully asynchronous online Federated Learning with adversaries.
Our main result is that the proposed algorithm almost surely converges to the desired mean $mu.$
We derive this convergence using a novel differential-inclusion-based two-timescale analysis.
arXiv Detail & Related papers (2023-04-04T04:32:29Z) - On the Sample Complexity of Representation Learning in Multi-task
Bandits with Global and Local structure [77.60508571062958]
We investigate the sample complexity of learning the optimal arm for multi-task bandit problems.
Arms consist of two components: one that is shared across tasks (that we call representation) and one that is task-specific (that we call predictor)
We devise an algorithm OSRL-SC whose sample complexity approaches the lower bound, and scales at most as $H(Glog(delta_G)+ Xlog(delta_H))$, with $X,G,H$ being, respectively, the number of tasks, representations and predictors.
arXiv Detail & Related papers (2022-11-28T08:40:12Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Choosing Answers in $\varepsilon$-Best-Answer Identification for Linear
Bandits [0.8122270502556374]
In a pure-exploration problem, information is gathered sequentially to answer a question on the environment.
We show that picking the answer with highest mean does not allow an algorithm to reach optimality in terms of expected sample complexity.
We develop a simple procedure to adapt best-arm identification algorithms to tackle $varepsilon$-best-answer identification in transductive linear bandits.
arXiv Detail & Related papers (2022-06-09T12:27:51Z) - There is no Accuracy-Interpretability Tradeoff in Reinforcement Learning
for Mazes [64.05903267230467]
Interpretability is an essential building block for trustworthiness in reinforcement learning systems.
We show that in certain cases, one can achieve policy interpretability while maintaining its optimality.
arXiv Detail & Related papers (2022-06-09T04:23:26Z) - On the complexity of All $\varepsilon$-Best Arms Identification [2.1485350418225244]
We consider the problem of identifying all the $varepsilon$-optimal arms in a finite multi-armed bandit with Gaussian rewards.
We propose a Track-and-Stop algorithm that identifies the set of $varepsilon$-good arms w.h.p and enjoys optimality (when $delta$ goes to zero) in terms of the expected sample complexity.
arXiv Detail & Related papers (2022-02-13T10:54:52Z) - Selective Sampling for Online Best-arm Identification [19.767267982167578]
Given a set of potential options, a learner aims to compute with probability greater than $1-delta$.
The main results of this work precisely characterize this trade-off between labeled samples and stopping time.
Our framework is general enough to capture binary classification improving upon previous works.
arXiv Detail & Related papers (2021-10-28T03:02:08Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Probabilistic Sequential Shrinking: A Best Arm Identification Algorithm
for Stochastic Bandits with Corruptions [91.8283876874947]
We consider a best arm identification (BAI) problem for Sequential bandits with adversarial corruptions in the fixed-budget setting of T steps.
We design a novel randomized algorithm, Probabilistic Shrinking($u$) (PSS($u$)), which is agnostic to the amount of corruptions.
We show that when the CPS is sufficiently large, no algorithm can achieve a BAI probability tending to $1$ as $Trightarrow infty$.
arXiv Detail & Related papers (2020-10-15T17:34:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.