On Under-exploration in Bandits with Mean Bounds from Confounded Data
- URL: http://arxiv.org/abs/2002.08405v4
- Date: Thu, 10 Jun 2021 14:55:17 GMT
- Title: On Under-exploration in Bandits with Mean Bounds from Confounded Data
- Authors: Nihal Sharma, Soumya Basu, Karthikeyan Shanmugam and Sanjay Shakkottai
- Abstract summary: We study a variant of the multi-armed bandit problem where side information in the form of bounds on the mean of each arm is provided.
We develop the novel non-optimistic Global Under-Explore (GLUE) algorithm which uses the provided mean bounds.
We show that mean bounds can be inferred naturally from such logs and can thus be used to improve the learning process.
- Score: 41.09427248084205
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a variant of the multi-armed bandit problem where side information
in the form of bounds on the mean of each arm is provided. We develop the novel
non-optimistic Global Under-Explore (GLUE) algorithm which uses the provided
mean bounds (across all the arms) to infer pseudo-variances for each arm, which
in turn decide the rate of exploration for the arms. We analyze the regret of
GLUE and prove regret upper bounds that are never worse than that of the
standard UCB algorithm. Furthermore, we show that GLUE improves upon regret
guarantees that exists in literature for structured bandit problems (both
theoretically and empirically). Finally, we study the practical setting of
learning adaptive interventions using prior data that has been confounded by
unrecorded variables that affect rewards. We show that mean bounds can be
inferred naturally from such logs and can thus be used to improve the learning
process. We validate our findings through semi-synthetic experiments on data
derived from real data sets.
Related papers
- Causal bandits with backdoor adjustment on unknown Gaussian DAGs [5.807183284468881]
We study the causal bandit problem when the graph structure is unknown.
We identify backdoor adjustment sets for each arm using sequentially generated experimental and observational data.
We develop a novel bandit algorithm, based on modified upper confidence bounds, to sequentially determine the optimal intervention.
arXiv Detail & Related papers (2025-02-04T05:18:35Z) - Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints [8.784438985280094]
We study problems as pure exploration in multi-armed bandits with unknown linear constraints.
First, we propose a Lagrangian relaxation of the sample complexity lower bound for pure exploration under constraints.
Second, we leverage the Lagrangian lower bound and the properties of convex to propose two computationally efficient extensions of Track-and-Stop and Gamified Explorer, namely LATS and LAGEX.
arXiv Detail & Related papers (2024-10-24T15:26:14Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - An Optimal Algorithm for the Real-Valued Combinatorial Pure Exploration
of Multi-Armed Bandit [65.268245109828]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB)
Existing methods in the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits.
We propose an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity matches the lower bound.
arXiv Detail & Related papers (2023-06-15T15:37:31Z) - From Optimality to Robustness: Dirichlet Sampling Strategies in
Stochastic Bandits [0.0]
We study a generic Dirichlet Sampling (DS) algorithm, based on pairwise comparisons of empirical indices computed with re-sampling of the arms' observations.
We show that different variants of this strategy achieve provably optimal regret guarantees when the distributions are bounded and logarithmic regret for semi-bounded distributions with a mild quantile condition.
arXiv Detail & Related papers (2021-11-18T14:34:21Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - A Unifying Theory of Thompson Sampling for Continuous Risk-Averse
Bandits [91.3755431537592]
This paper unifies the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem.
Using the contraction principle in the theory of large deviations, we prove novel concentration bounds for continuous risk functionals.
We show that a wide class of risk functionals as well as "nice" functions of them satisfy the continuity condition.
arXiv Detail & Related papers (2021-08-25T17:09:01Z) - Risk-Constrained Thompson Sampling for CVaR Bandits [82.47796318548306]
We consider a popular risk measure in quantitative finance known as the Conditional Value at Risk (CVaR)
We explore the performance of a Thompson Sampling-based algorithm CVaR-TS under this risk measure.
arXiv Detail & Related papers (2020-11-16T15:53:22Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Constrained regret minimization for multi-criterion multi-armed bandits [5.349852254138086]
We study the problem of regret minimization over a given time horizon, subject to a risk constraint.
We propose a Risk-Constrained Lower Confidence Bound algorithm that guarantees logarithmic regret.
We prove lower bounds on the performance of any risk-constrained regret minimization algorithm.
arXiv Detail & Related papers (2020-06-17T04:23:18Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z) - Bandit algorithms to emulate human decision making using probabilistic
distortions [20.422725678982726]
We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions.
We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
arXiv Detail & Related papers (2016-11-30T17:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.