Online Model Selection: a Rested Bandit Formulation
- URL: http://arxiv.org/abs/2012.03522v1
- Date: Mon, 7 Dec 2020 08:23:08 GMT
- Title: Online Model Selection: a Rested Bandit Formulation
- Authors: Leonardo Cella and Claudio Gentile and Massimiliano Pontil
- Abstract summary: We introduce and analyze a best arm identification problem in the rested bandit setting.
We define a novel notion of regret for this problem, where we compare to the policy that always plays the arm having the smallest expected loss at the end of the game.
Unlike known model selection efforts in the recent bandit literature, our algorithm exploits the specific structure of the problem to learn the unknown parameters of the expected loss function.
- Score: 49.69377391589057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by a natural problem in online model selection with bandit
information, we introduce and analyze a best arm identification problem in the
rested bandit setting, wherein arm expected losses decrease with the number of
times the arm has been played. The shape of the expected loss functions is
similar across arms, and is assumed to be available up to unknown parameters
that have to be learned on the fly. We define a novel notion of regret for this
problem, where we compare to the policy that always plays the arm having the
smallest expected loss at the end of the game. We analyze an arm elimination
algorithm whose regret vanishes as the time horizon increases. The actual rate
of convergence depends in a detailed way on the postulated functional form of
the expected losses. Unlike known model selection efforts in the recent bandit
literature, our algorithm exploits the specific structure of the problem to
learn the unknown parameters of the expected loss function so as to identify
the best arm as quickly as possible. We complement our analysis with a lower
bound, indicating strengths and limitations of the proposed solution.
Related papers
- Contextual Bandits with Arm Request Costs and Delays [19.263086804406786]
We introduce a novel extension of the contextual bandit problem, where new sets of arms can be requested with time delays and associated costs.
In this setting, the learner can select multiple arms from a decision set, with each selection taking one unit of time.
We design algorithms that can effectively select arms and determine the appropriate time to request new arms, thereby minimizing their regret.
arXiv Detail & Related papers (2024-10-17T00:44:50Z) - Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI.
Agent's goal is to identify the best arm with a prescribed confidence level.
Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z) - Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting [67.1631453378926]
Graph-Triggered Bandits is a framework to generalize rested and restless bandits.
In this work, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs.
arXiv Detail & Related papers (2024-09-09T18:23:07Z) - Combinatorial Blocking Bandits with Stochastic Delays [33.65025386998747]
Recent work has considered natural variations of the multi-armed bandit problem, where the reward of each arm is a special function of the time passed since its last pulling.
In this work, we extend the above model in two directions: (i) We consider the general setting where more than one arms can be played at each round, subject to feasibility constraints.
We provide a tight analysis of the approximation of a natural greedy subset that always plays the maximum expected reward feasible among the available (non-blocked) arms.
When the arms' expected rewards are unknown, we adapt the above algorithm into a bandit, based on
arXiv Detail & Related papers (2021-05-22T02:46:04Z) - Online Algorithm for Unsupervised Sequential Selection with Contextual
Information [38.47252956961143]
We study Contextual Unsupervised Sequential Selection (USS)
USS is a new variant of the contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback.
We propose an algorithm for the contextual USS problem and demonstrate that it has sub-linear regret.
arXiv Detail & Related papers (2020-10-23T12:32:21Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - A Novel Confidence-Based Algorithm for Structured Bandits [129.30402124516507]
We study finite-armed bandits where the rewards of each arm might be correlated to those of other arms.
We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem.
arXiv Detail & Related papers (2020-05-23T19:52:44Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.