Model Selection in Contextual Stochastic Bandit Problems
- URL: http://arxiv.org/abs/2003.01704v3
- Date: Sun, 4 Dec 2022 18:34:02 GMT
- Title: Model Selection in Contextual Stochastic Bandit Problems
- Authors: Aldo Pacchiano, My Phan, Yasin Abbasi-Yadkori, Anup Rao, Julian
Zimmert, Tor Lattimore, Csaba Szepesvari
- Abstract summary: We develop a meta-algorithm that selects between base algorithms.
We show through a lower bound that even when one of the base algorithms has $O(sqrtT)$ regret, in general it is impossible to get better than $Omega(sqrtT)$ regret.
- Score: 51.94632035240787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study bandit model selection in stochastic environments. Our approach
relies on a meta-algorithm that selects between candidate base algorithms. We
develop a meta-algorithm-base algorithm abstraction that can work with general
classes of base algorithms and different type of adversarial meta-algorithms.
Our methods rely on a novel and generic smoothing transformation for bandit
algorithms that permits us to obtain optimal $O(\sqrt{T})$ model selection
guarantees for stochastic contextual bandit problems as long as the optimal
base algorithm satisfies a high probability regret guarantee. We show through a
lower bound that even when one of the base algorithms has $O(\log T)$ regret,
in general it is impossible to get better than $\Omega(\sqrt{T})$ regret in
model selection, even asymptotically. Using our techniques, we address model
selection in a variety of problems such as misspecified linear contextual
bandits, linear bandit with unknown dimension and reinforcement learning with
unknown feature maps. Our algorithm requires the knowledge of the optimal base
regret to adjust the meta-algorithm learning rate. We show that without such
prior knowledge any meta-algorithm can suffer a regret larger than the optimal
base regret.
Related papers
- Second Order Methods for Bandit Optimization and Control [34.51425758864638]
We show that our algorithm achieves optimal (in terms of terms of convex functions that we call $kappa$-2020) regret bounds for a large class of convex functions.
We also investigate the adaptation of our second-order bandit algorithm to online convex optimization with memory.
arXiv Detail & Related papers (2024-02-14T04:03:38Z) - Semi-Bandit Learning for Monotone Stochastic Optimization [20.776114616154242]
We provide a generic online learning algorithm for a class of "monotone" problems.
Our framework applies to several fundamental problems in optimization such as prophet, Pandora's box knapsack, inequality matchings and submodular optimization.
arXiv Detail & Related papers (2023-12-24T07:46:37Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Adapting to Misspecification in Contextual Bandits [82.55565343668246]
We introduce a new family of oracle-efficient algorithms for $varepsilon$-misspecified contextual bandits.
We obtain the first algorithm that achieves the optimal $O(dsqrtT + varepsilonsqrtdT)$ regret bound for unknown misspecification level.
arXiv Detail & Related papers (2021-07-12T21:30:41Z) - Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms.
Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z) - Regret Bound Balancing and Elimination for Model Selection in Bandits
and RL [34.15978290083909]
We propose a simple model selection approach for algorithms in bandit and reinforcement learning problems.
We prove that the total regret of this approach is bounded by the best valid candidate regret bound times a multiplicative factor.
Unlike recent efforts in model selection for linear bandits, our approach is versatile enough to also cover cases where the context information is generated by an adversarial environment.
arXiv Detail & Related papers (2020-12-24T00:53:42Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - Corralling Stochastic Bandit Algorithms [54.10645564702416]
We show that the regret of the corralling algorithms is no worse than that of the best algorithm containing the arm with the highest reward.
We show that the gap between the highest reward and other rewards depends on the gap between the highest reward and other rewards.
arXiv Detail & Related papers (2020-06-16T15:33:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.