Related papers: Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget

Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget

URL: http://arxiv.org/abs/2202.04487v1
Date: Wed, 9 Feb 2022 14:36:05 GMT
Title: Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget
Authors: Jasmin Brandt, Bj\"orn Haddenhorst, Viktor Bengs, Eyke H\"ullermeier
Abstract summary: We consider the bandits problem with semi-bandit feedback under finite sampling budget constraints. The action is to choose a set of arms, whereupon feedback for each arm in the chosen set is received. We suggest a generic algorithm suitable to cover the full spectrum of conceivable arm elimination strategies.
Score: 6.759124697337311
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the combinatorial bandits problem with semi-bandit feedback under finite sampling budget constraints, in which the learner can carry out its action only for a limited number of times specified by an overall budget. The action is to choose a set of arms, whereupon feedback for each arm in the chosen set is received. Unlike existing works, we study this problem in a non-stochastic setting with subset-dependent feedback, i.e., the semi-bandit feedback received could be generated by an oblivious adversary and also might depend on the chosen set of arms. In addition, we consider a general feedback scenario covering both the numerical-based as well as preference-based case and introduce a sound theoretical framework for this setting guaranteeing sensible notions of optimal arms, which a learner seeks to find. We suggest a generic algorithm suitable to cover the full spectrum of conceivable arm elimination strategies from aggressive to conservative. Theoretical questions about the sufficient and necessary budget of the algorithm to find the best arm are answered and complemented by deriving lower bounds for any learning algorithm for this problem scenario.

Related papers

An Algorithm for Fixed Budget Best Arm Identification with Combinatorial Exploration [3.9901365062418312]
We consider the best arm identification problem in the $K-$armed bandit framework. Agent is allowed to play a subset of arms at each time slot instead of one arm. We propose an algorithm that constructs $log K$ groups and performs a likelihood ratio test to detect the presence of the best arm.
arXiv Detail & Related papers (2025-02-03T15:10:08Z)
Contextual Bandits with Arm Request Costs and Delays [19.263086804406786]
We introduce a novel extension of the contextual bandit problem, where new sets of arms can be requested with time delays and associated costs. In this setting, the learner can select multiple arms from a decision set, with each selection taking one unit of time. We design algorithms that can effectively select arms and determine the appropriate time to request new arms, thereby minimizing their regret.
arXiv Detail & Related papers (2024-10-17T00:44:50Z)
A General Framework for Clustering and Distribution Matching with Bandit Feedback [81.50716021326194]
We develop a general framework for clustering and distribution matching problems with bandit feedback. We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $delta$.
arXiv Detail & Related papers (2024-09-08T12:19:12Z)
Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits [1.4732811715354452]
We consider a $K$-armed bandit problem in which each arm consumes a different amount of resources when selected. We propose a series of algorithms that are randomized like Thompson Sampling but more carefully optimize their decisions with respect to the budget constraint.
arXiv Detail & Related papers (2024-08-28T04:56:06Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed. SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z)
Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward. We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy. We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z)
Constrained Pure Exploration Multi-Armed Bandits with a Fixed Budget [4.226118870861363]
We consider a constrained, pure exploration, multi-armed bandit formulation under a fixed budget. We propose an algorithm called textscConstrained-SR based on the Successive Rejects framework. We show that the associated decay rate is nearly optimal relative to an information theoretic lower bound in certain special cases.
arXiv Detail & Related papers (2022-11-27T08:58:16Z)
From Finite to Countable-Armed Bandits [8.099977107670918]
We consider a bandit problem with countably many arms that belong to a finite set of types. There is a fixed distribution over types which sets the proportion of each type in the population of arms. We propose a fully adaptive online learning algorithm that achieves O(log n) distribution-dependent expected cumulative regret after any number of plays n.
arXiv Detail & Related papers (2021-05-22T13:09:50Z)
Combinatorial Blocking Bandits with Stochastic Delays [33.65025386998747]
Recent work has considered natural variations of the multi-armed bandit problem, where the reward of each arm is a special function of the time passed since its last pulling. In this work, we extend the above model in two directions: (i) We consider the general setting where more than one arms can be played at each round, subject to feasibility constraints. We provide a tight analysis of the approximation of a natural greedy subset that always plays the maximum expected reward feasible among the available (non-blocked) arms. When the arms' expected rewards are unknown, we adapt the above algorithm into a bandit, based on
arXiv Detail & Related papers (2021-05-22T02:46:04Z)
Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z)
Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds. Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.