Related papers: Lookahead identification in adversarial bandits: accuracy and memory bounds

Lookahead identification in adversarial bandits: accuracy and memory bounds

URL: http://arxiv.org/abs/2603.00803v1
Date: Sat, 28 Feb 2026 20:38:58 GMT
Title: Lookahead identification in adversarial bandits: accuracy and memory bounds
Authors: Nataly Brukhim, Nicolò Cesa-Bianchi, Carlo Ciliberto,
Abstract summary: We study an identification problem in multi-armed bandits.<n>In each round a learner selects one of $K$ arms and observes its reward, with the goal of eventually identifying an arm that will perform best at a it future time.<n>Past performance may offer little information about the future, raising the question of whether meaningful identification is possible at all.
Score: 24.731657695290227
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study an identification problem in multi-armed bandits. In each round a learner selects one of $K$ arms and observes its reward, with the goal of eventually identifying an arm that will perform best at a {\it future} time. In adversarial environments, however, past performance may offer little information about the future, raising the question of whether meaningful identification is possible at all. In this work, we introduce \emph{lookahead identification}, a task in which the goal of the learner is to select a future prediction window and commit in advance to an arm whose average reward over that window is within $\varepsilon$ of optimal. Our analysis characterizes both the achievable accuracy of lookahead identification and the memory resources required to obtain it. From an accuracy standpoint, for any horizon $T$ we give an algorithm achieving $\varepsilon = O\bigl(1/\sqrt{\log T}\bigr)$ over $Ω(\sqrt{T})$ prediction windows. This demonstrates that, perhaps surprisingly, identification is possible in adversarial settings, despite significant lack of information. We also prove a near-matching lower bound showing that $\varepsilon = Ω\bigl(1/\log T\bigr)$ is unavoidable. We then turn to investigate the role of memory in our problem, first proving that any algorithm achieving nontrivial accuracy requires $Ω(K)$ bits of memory. Under a natural \emph{local sparsity} condition, we show that the same accuracy guarantees can be achieved using only poly-logarithmic memory.

Related papers

Optimal Multi-Objective Best Arm Identification with Fixed Confidence [62.36929749450298]
We consider a multi-armed bandit setting in which each arm yields an $M$-dimensional vector reward upon selection.<n>The end goal is to identify the best arm of em every objective in the shortest (expected) time subject to an upper bound on the probability of error.<n>We propose an algorithm that uses the novel idea of em surrogate proportions to sample the arms at each time step, eliminating the need to solve the max-min optimisation problem at each step.
arXiv Detail & Related papers (2025-01-23T12:28:09Z)
Fingerprinting Codes Meet Geometry: Improved Lower Bounds for Private Query Release and Adaptive Data Analysis [25.476062424924713]
We propose a general framework for proving fingerprinting type lower bounds, that allows us to tailor the technique to the geometry of the query set.<n>We show that any (sample- and population-)accurate algorithm for answering $Q$ arbitrary adaptive counting queries over a universe $mathcalX$ to accuracy $alpha$ needs $Omega(fracsqrt log|mathcalX| log (1/delta) log Qvarepsilonalpha2)$ samples, matching known upper bounds up to constants
arXiv Detail & Related papers (2024-12-18T23:11:07Z)
Almost Minimax Optimal Best Arm Identification in Piecewise Stationary Linear Bandits [55.957560311008926]
We propose a piecewise stationary linear bandit (PSLB) model where the quality of an arm is measured by its return averaged over all contexts. PS$varepsilon$BAI$+$ is guaranteed to identify an $varepsilon$-optimal arm with probability $ge 1-delta$ and with a minimal number of samples.
arXiv Detail & Related papers (2024-10-10T06:15:42Z)
A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity [28.068960555415014]
We investigate the fixed-budget best-arm identification problem for linear bandits in a potentially non-stationary environment. An algorithm will aim to correctly identify the best arm $x* := argmax_xinmathcalXxtopsum_t=1Ttheta_t$ with probability as high as possible.
arXiv Detail & Related papers (2023-07-27T19:03:36Z)
Online Learning with Adversaries: A Differential-Inclusion Analysis [52.43460995467893]
We introduce an observation-matrix-based framework for fully asynchronous online Federated Learning with adversaries. Our main result is that the proposed algorithm almost surely converges to the desired mean $mu.$ We derive this convergence using a novel differential-inclusion-based two-timescale analysis.
arXiv Detail & Related papers (2023-04-04T04:32:29Z)
On the Sample Complexity of Representation Learning in Multi-task Bandits with Global and Local structure [77.60508571062958]
We investigate the sample complexity of learning the optimal arm for multi-task bandit problems. Arms consist of two components: one that is shared across tasks (that we call representation) and one that is task-specific (that we call predictor) We devise an algorithm OSRL-SC whose sample complexity approaches the lower bound, and scales at most as $H(Glog(delta_G)+ Xlog(delta_H))$, with $X,G,H$ being, respectively, the number of tasks, representations and predictors.
arXiv Detail & Related papers (2022-11-28T08:40:12Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model. The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z)
Choosing Answers in $\varepsilon$-Best-Answer Identification for Linear Bandits [0.8122270502556374]
In a pure-exploration problem, information is gathered sequentially to answer a question on the environment. We show that picking the answer with highest mean does not allow an algorithm to reach optimality in terms of expected sample complexity. We develop a simple procedure to adapt best-arm identification algorithms to tackle $varepsilon$-best-answer identification in transductive linear bandits.
arXiv Detail & Related papers (2022-06-09T12:27:51Z)
There is no Accuracy-Interpretability Tradeoff in Reinforcement Learning for Mazes [64.05903267230467]
Interpretability is an essential building block for trustworthiness in reinforcement learning systems. We show that in certain cases, one can achieve policy interpretability while maintaining its optimality.
arXiv Detail & Related papers (2022-06-09T04:23:26Z)
On the complexity of All $\varepsilon$-Best Arms Identification [2.1485350418225244]
We consider the problem of identifying all the $varepsilon$-optimal arms in a finite multi-armed bandit with Gaussian rewards. We propose a Track-and-Stop algorithm that identifies the set of $varepsilon$-good arms w.h.p and enjoys optimality (when $delta$ goes to zero) in terms of the expected sample complexity.
arXiv Detail & Related papers (2022-02-13T10:54:52Z)
Selective Sampling for Online Best-arm Identification [19.767267982167578]
Given a set of potential options, a learner aims to compute with probability greater than $1-delta$. The main results of this work precisely characterize this trade-off between labeled samples and stopping time. Our framework is general enough to capture binary classification improving upon previous works.
arXiv Detail & Related papers (2021-10-28T03:02:08Z)
Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes. Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z)
Probabilistic Sequential Shrinking: A Best Arm Identification Algorithm for Stochastic Bandits with Corruptions [91.8283876874947]
We consider a best arm identification (BAI) problem for Sequential bandits with adversarial corruptions in the fixed-budget setting of T steps. We design a novel randomized algorithm, Probabilistic Shrinking($u$) (PSS($u$)), which is agnostic to the amount of corruptions. We show that when the CPS is sufficiently large, no algorithm can achieve a BAI probability tending to $1$ as $Trightarrow infty$.
arXiv Detail & Related papers (2020-10-15T17:34:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.