Pure Exploration of Causal Bandits
- URL: http://arxiv.org/abs/2206.07883v1
- Date: Thu, 16 Jun 2022 02:19:37 GMT
- Title: Pure Exploration of Causal Bandits
- Authors: Nuoya Xiong, Wei Chen
- Abstract summary: Causal bandit problem integrates causal inference with multi-armed bandits.
Online learning task: given a causal graph with unknown causal inference distributions, we can choose to either intervene one variable or do no intervention.
We provide first gap-dependent fully adaptive pure exploration algorithms on three types of causal models.
- Score: 9.77519365079468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Causal bandit problem integrates causal inference with multi-armed bandits.
The pure exploration of causal bandits is the following online learning task:
given a causal graph with unknown causal inference distributions, in each round
we can choose to either intervene one variable or do no intervention, and
observe the random outcomes of all random variables, with the goal that using
as few rounds as possible, we can output an intervention that gives the best
(or almost best) expected outcome on the reward variable $Y$ with probability
at least $1-\delta$, where $\delta$ is a given confidence level. We provide
first gap-dependent fully adaptive pure exploration algorithms on three types
of causal models including parallel graphs, general graphs with small number of
backdoor parents, and binary generalized linear models. Our algorithms improve
both prior causal bandit algorithms, which are not adaptive to reward gaps, and
prior adaptive pure exploration algorithms, which do not utilize the special
features of causal bandits.
Related papers
- Partial Structure Discovery is Sufficient for No-regret Learning in Causal Bandits [7.064432289838905]
Current works often assume the causal graph is known, which may not always be available a priori.
We focus on the causal bandit problem in scenarios where the underlying causal graph is unknown and may include latent confounders.
We formally characterize the set of necessary and sufficient latent confounders one needs to detect or learn to ensure that all possibly optimal arms are identified correctly.
arXiv Detail & Related papers (2024-11-06T16:59:11Z) - Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Replicability is Asymptotically Free in Multi-armed Bandits [45.729105054410745]
This work is motivated by the growing demand for reproducible machine learning.
In particular, we consider a replicable algorithm that ensures, with high probability, that the algorithm's sequence of actions is not affected by the randomness inherent in the dataset.
arXiv Detail & Related papers (2024-02-12T03:31:34Z) - Additive Causal Bandits with Unknown Graph [10.575089475850465]
We explore algorithms to select actions in the causal bandit setting where the learner can choose to intervene on a set of random variables related by a causal graph.
The learner's goal is to quickly find the intervention, among all interventions on observable variables, that maximizes the expectation of an outcome variable.
arXiv Detail & Related papers (2023-06-13T15:43:04Z) - Bandit Social Learning: Exploration under Myopic Behavior [58.75758600464338]
We study social learning dynamics motivated by reviews on online platforms.
Agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration.
We derive stark learning failures for any such behavior, and provide matching positive results.
arXiv Detail & Related papers (2023-02-15T01:57:57Z) - Causal Bandits for Linear Structural Equation Models [58.2875460517691]
This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model.
It is assumed that the graph's structure is known and has $N$ nodes.
Two algorithms are proposed for the frequentist (UCB-based) and Bayesian settings.
arXiv Detail & Related papers (2022-08-26T16:21:31Z) - Combinatorial Causal Bandits [25.012065471684025]
In causal bandits, the learning agent chooses at most $K$ variables in each round to intervene, with the goal of minimizing expected regret on the target variable $Y$.
We study under the context of binary generalized linear models (BGLMs) with a succinct parametric representation of the causal models.
We present the algorithm BGLM-OFU for Markovian BGLMs based on the maximum likelihood estimation method, and show that it achieves $O(sqrtTlog T)$ regret, where $T$ is the time horizon.
arXiv Detail & Related papers (2022-06-04T14:14:58Z) - Risk-Aware Algorithms for Combinatorial Semi-Bandits [7.716156977428555]
We study the multi-armed bandit problem under semi-bandit feedback.
We consider the problem of maximizing the Conditional Value-at-Risk (CVaR), a risk measure that takes into account only the worst-case rewards.
We propose new algorithms that maximize the CVaR of the rewards obtained from the super arms of the bandit.
arXiv Detail & Related papers (2021-12-02T11:29:43Z) - Universal and data-adaptive algorithms for model selection in linear
contextual bandits [52.47796554359261]
We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem.
We introduce new algorithms that explore in a data-adaptive manner and provide guarantees of the form $mathcalO(dalpha T1- alpha)$.
Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.
arXiv Detail & Related papers (2021-11-08T18:05:35Z) - Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z) - Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased.
We propose a novel approach for dueling bandits based on information-directed sampling (IDS)
Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.