Online Model Selection: a Rested Bandit Formulation
- URL: http://arxiv.org/abs/2012.03522v1
- Date: Mon, 7 Dec 2020 08:23:08 GMT
- Title: Online Model Selection: a Rested Bandit Formulation
- Authors: Leonardo Cella and Claudio Gentile and Massimiliano Pontil
- Abstract summary: We introduce and analyze a best arm identification problem in the rested bandit setting.
We define a novel notion of regret for this problem, where we compare to the policy that always plays the arm having the smallest expected loss at the end of the game.
Unlike known model selection efforts in the recent bandit literature, our algorithm exploits the specific structure of the problem to learn the unknown parameters of the expected loss function.
- Score: 49.69377391589057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by a natural problem in online model selection with bandit
information, we introduce and analyze a best arm identification problem in the
rested bandit setting, wherein arm expected losses decrease with the number of
times the arm has been played. The shape of the expected loss functions is
similar across arms, and is assumed to be available up to unknown parameters
that have to be learned on the fly. We define a novel notion of regret for this
problem, where we compare to the policy that always plays the arm having the
smallest expected loss at the end of the game. We analyze an arm elimination
algorithm whose regret vanishes as the time horizon increases. The actual rate
of convergence depends in a detailed way on the postulated functional form of
the expected losses. Unlike known model selection efforts in the recent bandit
literature, our algorithm exploits the specific structure of the problem to
learn the unknown parameters of the expected loss function so as to identify
the best arm as quickly as possible. We complement our analysis with a lower
bound, indicating strengths and limitations of the proposed solution.
Related papers
- Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Multi-Armed Bandits with Abstention [62.749500564313834]
We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention.
In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the instantaneous reward before observing it.
arXiv Detail & Related papers (2024-02-23T06:27:12Z) - Combinatorial Blocking Bandits with Stochastic Delays [33.65025386998747]
Recent work has considered natural variations of the multi-armed bandit problem, where the reward of each arm is a special function of the time passed since its last pulling.
In this work, we extend the above model in two directions: (i) We consider the general setting where more than one arms can be played at each round, subject to feasibility constraints.
We provide a tight analysis of the approximation of a natural greedy subset that always plays the maximum expected reward feasible among the available (non-blocked) arms.
When the arms' expected rewards are unknown, we adapt the above algorithm into a bandit, based on
arXiv Detail & Related papers (2021-05-22T02:46:04Z) - Online Algorithm for Unsupervised Sequential Selection with Contextual
Information [38.47252956961143]
We study Contextual Unsupervised Sequential Selection (USS)
USS is a new variant of the contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback.
We propose an algorithm for the contextual USS problem and demonstrate that it has sub-linear regret.
arXiv Detail & Related papers (2020-10-23T12:32:21Z) - Thompson Sampling for Unsupervised Sequential Selection [3.5436187733613087]
We study Thompson Sampling based algorithm for Unsupervised Sequential Selection (USS) problem.
USS problem is a variant of the multi-armed bandits problem, where the loss of an arm can not be inferred from the observed feedback.
We show that our Thompson Sampling based algorithm for the USS problem achieves near optimal regret and has better numerical performance than existing algorithms.
arXiv Detail & Related papers (2020-09-16T08:48:17Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - A Novel Confidence-Based Algorithm for Structured Bandits [129.30402124516507]
We study finite-armed bandits where the rewards of each arm might be correlated to those of other arms.
We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem.
arXiv Detail & Related papers (2020-05-23T19:52:44Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.