Greedy Algorithm for Structured Bandits: A Sharp Characterization of Asymptotic Success / Failure
- URL: http://arxiv.org/abs/2503.04010v2
- Date: Sun, 18 May 2025 20:36:56 GMT
- Title: Greedy Algorithm for Structured Bandits: A Sharp Characterization of Asymptotic Success / Failure
- Authors: Aleksandrs Slivkins, Yunzong Xu, Shiliang Zuo,
- Abstract summary: We study the greedy (exploitation-only) algorithm in bandit problems with a known reward structure.<n>Our characterization extends to contextual bandits and interactive decision-making with arbitrary feedback.
- Score: 51.4953549382486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the greedy (exploitation-only) algorithm in bandit problems with a known reward structure. We allow arbitrary finite reward structures, while prior work focused on a few specific ones. We fully characterize when the greedy algorithm asymptotically succeeds or fails, in the sense of sublinear vs. linear regret as a function of time. Our characterization identifies a partial identifiability property of the problem instance as the necessary and sufficient condition for the asymptotic success. Notably, once this property holds, the problem becomes easy -- any algorithm will succeed (in the same sense as above), provided it satisfies a mild non-degeneracy condition. Our characterization extends to contextual bandits and interactive decision-making with arbitrary feedback. Examples demonstrating broad applicability and extensions to infinite reward structures are provided.
Related papers
- Contextual Bandits with Stage-wise Constraints [46.412612374393895]
We study contextual bandits in the presence of a stage-wise constraint (a constraint at each round)
We propose an upper-confidence bound algorithm for the problem and prove a $T$-round regret bound for it.
arXiv Detail & Related papers (2024-01-15T23:58:21Z) - Agnostic Multi-Robust Learning Using ERM [19.313739782029185]
A fundamental problem in robust learning is asymmetry: a learner needs to correctly classify every one of exponentially-many perturbations that an adversary might make to a test-time natural example.
In contrast, the attacker only needs to find one successful perturbation.
We introduce a novel multi-group setting and introduce a novel multi-robust learning problem.
arXiv Detail & Related papers (2023-03-15T21:30:14Z) - Causal Bandits without Graph Learning [28.021500949026766]
We develop an efficient algorithm for finding the parent node of the reward node using atomic interventions.
We extend our algorithm to the case when the reward node has multiple parents.
arXiv Detail & Related papers (2023-01-26T20:27:14Z) - Online Sign Identification: Minimization of the Number of Errors in
Thresholding Bandits [27.09804256642197]
We introduce a large family of algorithms inspired by the Frank-Wolfe algorithm.
We construct new explicit algorithms for a broad class of problems.
We explain this phenomenon on an insightful toy problem.
arXiv Detail & Related papers (2021-10-18T09:36:36Z) - Adapting to Misspecification in Contextual Bandits [82.55565343668246]
We introduce a new family of oracle-efficient algorithms for $varepsilon$-misspecified contextual bandits.
We obtain the first algorithm that achieves the optimal $O(dsqrtT + varepsilonsqrtdT)$ regret bound for unknown misspecification level.
arXiv Detail & Related papers (2021-07-12T21:30:41Z) - Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave.
Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality.
We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z) - Leveraging Good Representations in Linear Contextual Bandits [131.91060536108301]
A contextual bandit problem may admit multiple linear representations.
Recent works showed that there exist "good" representations for which constant problem-dependent regret can be achieved.
We show that the regret is indeed never worse than the regret obtained by running LinUCB on the best representation.
arXiv Detail & Related papers (2021-04-08T14:05:31Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Instance-Dependent Complexity of Contextual Bandits and Reinforcement
Learning: A Disagreement-Based Perspective [104.67295710363679]
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm.
We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds.
We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case.
arXiv Detail & Related papers (2020-10-07T01:33:06Z) - Structure Adaptive Algorithms for Stochastic Bandits [22.871155520200773]
We study reward maximisation in a class of structured multi-armed bandit problems.
The mean rewards of arms satisfy some given structural constraints.
We develop algorithms from instance-dependent lower-bounds using iterative saddle-point solvers.
arXiv Detail & Related papers (2020-07-02T08:59:54Z) - Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a
Differentially Private Scheme [16.1694012177079]
We study the best-arm identification problem in multi-armed bandits with, potentially private rewards.
The goal is to identify the arm with the highest quantile at a fixed, prescribed level.
We show that our algorithm is $delta$-PAC and we characterize its sample complexity.
arXiv Detail & Related papers (2020-06-11T20:23:43Z) - Beyond Worst-Case Analysis in Stochastic Approximation: Moment
Estimation Improves Instance Complexity [58.70807593332932]
We study oracle complexity of gradient based methods for approximation problems.
We focus on instance-dependent complexity instead of worst case complexity.
Our proposed algorithm and its analysis provide a theoretical justification for the success of moment estimation.
arXiv Detail & Related papers (2020-06-08T09:25:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.