Recurrent Submodular Welfare and Matroid Blocking Bandits
- URL: http://arxiv.org/abs/2102.00321v1
- Date: Sat, 30 Jan 2021 21:51:47 GMT
- Title: Recurrent Submodular Welfare and Matroid Blocking Bandits
- Authors: Orestis Papadigenopoulos and Constantine Caramanis
- Abstract summary: A recent line of research focuses on the study of the multi-armed bandits problem (MAB)
We develop new algorithmic ideas that allow us to obtain a $ (1 - frac1e)$-approximation for any matroid.
A key ingredient is the technique of correlated (interleaved) scheduling.
- Score: 22.65352007353614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A recent line of research focuses on the study of the stochastic multi-armed
bandits problem (MAB), in the case where temporal correlations of specific
structure are imposed between the player's actions and the reward distributions
of the arms (Kleinberg and Immorlica [FOCS18], Basu et al. [NIPS19]). As
opposed to the standard MAB setting, where the optimal solution in hindsight
can be trivially characterized, these correlations lead to (sub-)optimal
solutions that exhibit interesting dynamical patterns -- a phenomenon that
yields new challenges both from an algorithmic as well as a learning
perspective. In this work, we extend the above direction to a combinatorial
bandit setting and study a variant of stochastic MAB, where arms are subject to
matroid constraints and each arm becomes unavailable (blocked) for a fixed
number of rounds after each play. A natural common generalization of the
state-of-the-art for blocking bandits, and that for matroid bandits, yields a
$(1-\frac{1}{e})$-approximation for partition matroids, yet it only guarantees
a $\frac{1}{2}$-approximation for general matroids. In this paper we develop
new algorithmic ideas that allow us to obtain a polynomial-time $(1 -
\frac{1}{e})$-approximation algorithm (asymptotically and in expectation) for
any matroid, and thus allow us to control the $(1-\frac{1}{e})$-approximate
regret. A key ingredient is the technique of correlated (interleaved)
scheduling. Along the way, we discover an interesting connection to a variant
of Submodular Welfare Maximization, for which we provide (asymptotically)
matching upper and lower approximability bounds.
Related papers
- A General Framework for Clustering and Distribution Matching with Bandit Feedback [81.50716021326194]
We develop a general framework for clustering and distribution matching problems with bandit feedback.
We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $delta$.
arXiv Detail & Related papers (2024-09-08T12:19:12Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Complexity Analysis of a Countable-armed Bandit Problem [9.163501953373068]
We study the classical problem of minimizing the expected cumulative regret over a horizon of play $n$.
We propose algorithms that achieve a rate-optimal finite-time instance-dependent regret of $mathcalOleft( log n right)$ when $K=2$.
While the order of regret and complexity of the problem suggests a great degree of similarity to the classical MAB problem, properties of the performance bounds and salient aspects of algorithm design are quite distinct from the latter.
arXiv Detail & Related papers (2023-01-18T00:53:46Z) - Stochastic Rising Bandits [40.32303434592863]
We study a particular case of the rested and restless bandits in which the arms' expected payoff is monotonically non-decreasing.
This characteristic allows designing specifically crafted algorithms that exploit the regularity of the payoffs to provide tight regret bounds.
We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset.
arXiv Detail & Related papers (2022-12-07T17:30:45Z) - On Submodular Contextual Bandits [92.45432756301231]
We consider the problem of contextual bandits where actions are subsets of a ground set and mean rewards are modeled by an unknown monotone submodular function.
We show that our algorithm efficiently randomizes around local optima of estimated functions according to the Inverse Gap Weighting strategy.
arXiv Detail & Related papers (2021-12-03T21:42:33Z) - Combinatorial Blocking Bandits with Stochastic Delays [33.65025386998747]
Recent work has considered natural variations of the multi-armed bandit problem, where the reward of each arm is a special function of the time passed since its last pulling.
In this work, we extend the above model in two directions: (i) We consider the general setting where more than one arms can be played at each round, subject to feasibility constraints.
We provide a tight analysis of the approximation of a natural greedy subset that always plays the maximum expected reward feasible among the available (non-blocked) arms.
When the arms' expected rewards are unknown, we adapt the above algorithm into a bandit, based on
arXiv Detail & Related papers (2021-05-22T02:46:04Z) - Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous.
We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy.
We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
arXiv Detail & Related papers (2021-02-15T19:10:52Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Contextual Blocking Bandits [35.235375147227124]
We study a novel variant of the multi-armed bandit problem, where at each time step, the player observes an independently sampled context that determines the arms' mean rewards.
Playing an arm blocks it (across all contexts) for a fixed and known number of future time steps.
We propose a UCB-based variant of the full-information algorithm that guarantees a $mathcalO(log T)$-regret w.r.t. an $alpha$regret strategy in $T time steps, matching the $Omega(log(T)$ lower bound
arXiv Detail & Related papers (2020-03-06T20:34:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.