Bandit Algorithms for Prophet Inequality and Pandora's Box
- URL: http://arxiv.org/abs/2211.08586v2
- Date: Thu, 7 Dec 2023 03:44:37 GMT
- Title: Bandit Algorithms for Prophet Inequality and Pandora's Box
- Authors: Khashayar Gatmiry, Thomas Kesselheim, Sahil Singla, and Yifan Wang
- Abstract summary: We study the Prophet Inequality and Pandora's Box problems in the Multi-Armed Bandits model.
Our results give near-optimal $tildeO(mathsfpoly(n)sqrtT)$ total regret algorithms for both Prophet Inequality and Pandora's Box.
- Score: 13.709418181148148
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Prophet Inequality and Pandora's Box problems are fundamental stochastic
problem with applications in Mechanism Design, Online Algorithms, Stochastic
Optimization, Optimal Stopping, and Operations Research. A usual assumption in
these works is that the probability distributions of the $n$ underlying random
variables are given as input to the algorithm. Since in practice these
distributions need to be learned, we initiate the study of such stochastic
problems in the Multi-Armed Bandits model.
In the Multi-Armed Bandits model we interact with $n$ unknown distributions
over $T$ rounds: in round $t$ we play a policy $x^{(t)}$ and receive a partial
(bandit) feedback on the performance of $x^{(t)}$. The goal is to minimize the
regret, which is the difference over $T$ rounds in the total value of the
optimal algorithm that knows the distributions vs. the total value of our
algorithm that learns the distributions from the partial feedback. Our main
results give near-optimal $\tilde{O}(\mathsf{poly}(n)\sqrt{T})$ total regret
algorithms for both Prophet Inequality and Pandora's Box.
Our proofs proceed by maintaining confidence intervals on the unknown indices
of the optimal policy. The exploration-exploitation tradeoff prevents us from
directly refining these confidence intervals, so the main technique is to
design a regret upper bound that is learnable while playing low-regret Bandit
policies.
Related papers
- Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Semi-Bandit Learning for Monotone Stochastic Optimization [20.776114616154242]
We provide a generic online learning algorithm for a class of "monotone" problems.
Our framework applies to several fundamental problems in optimization such as prophet, Pandora's box knapsack, inequality matchings and submodular optimization.
arXiv Detail & Related papers (2023-12-24T07:46:37Z) - Estimating Optimal Policy Value in General Linear Contextual Bandits [50.008542459050155]
In many bandit problems, the maximal reward achievable by a policy is often unknown in advance.
We consider the problem of estimating the optimal policy value in the sublinear data regime before the optimal policy is even learnable.
We present a more practical, computationally efficient algorithm that estimates a problem-dependent upper bound on $V*$.
arXiv Detail & Related papers (2023-02-19T01:09:24Z) - Maillard Sampling: Boltzmann Exploration Done Optimally [11.282341369957216]
This thesis presents a randomized algorithm for the $K$-armed bandit problem.
Maillard sampling (MS) computes the probability of choosing each arm in a closed form.
We propose a variant of MS called MS$+$ that improves its minimax bound to $sqrtKTlogK$ without losing the optimality.
arXiv Detail & Related papers (2021-11-05T06:50:22Z) - Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits
with Super Heavy-Tailed Payoffs [27.636407641546914]
We propose a novel robust statistical estimator, mean of medians, which estimates a random variable by computing the empirical mean of a sequence of empirical medians.
We show that the regret bound is near-optimal even with very heavy-tailed noise.
arXiv Detail & Related papers (2021-10-26T17:30:44Z) - Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z) - No Regrets for Learning the Prior in Bandits [30.478188057004175]
$tt AdaTS$ is a Thompson sampling algorithm that adapts sequentially to bandit tasks.
$tt AdaTS$ is a fully-Bayesian algorithm that can be implemented efficiently in several classes of bandit problems.
arXiv Detail & Related papers (2021-07-13T15:51:32Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise.
We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z) - Improved Sleeping Bandits with Stochastic Actions Sets and Adversarial
Rewards [59.559028338399855]
We consider the problem of sleeping bandits with action sets and adversarial rewards.
In this paper, we provide a new computationally efficient algorithm inspired by EXP3 satisfying a regret of order $O(sqrtT)$.
arXiv Detail & Related papers (2020-04-14T00:41:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.