Causal Bandits without prior knowledge using separating sets
- URL: http://arxiv.org/abs/2009.07916v2
- Date: Thu, 29 Sep 2022 12:33:53 GMT
- Title: Causal Bandits without prior knowledge using separating sets
- Authors: Arnoud A.W.M. de Kroon, Danielle Belgrave, Joris M. Mooij
- Abstract summary: The Causal Bandit is a variant of the classic Bandit problem where an agent must identify the best action in a sequential decision-making process.
Methods proposed for this problem thus far in the literature rely on exact prior knowledge of the full causal graph.
We formulate new causal bandit algorithms that no longer necessarily rely on prior causal knowledge.
- Score: 3.1000291317725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Causal Bandit is a variant of the classic Bandit problem where an agent
must identify the best action in a sequential decision-making process, where
the reward distribution of the actions displays a non-trivial dependence
structure that is governed by a causal model. Methods proposed for this problem
thus far in the literature rely on exact prior knowledge of the full causal
graph. We formulate new causal bandit algorithms that no longer necessarily
rely on prior causal knowledge. Instead, they utilize an estimator based on
separating sets, which we can find using simple conditional independence tests
or causal discovery methods. We show that, given a true separating set, for
discrete i.i.d. data, this estimator is unbiased, and has variance which is
upper bounded by that of the sample mean. We develop algorithms based on
Thompson Sampling and UCB for discrete and Gaussian models respectively and
show increased performance on simulation data as well as on a bandit drawing
from real-world protein signaling data.
Related papers
- Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Additive Causal Bandits with Unknown Graph [10.575089475850465]
We explore algorithms to select actions in the causal bandit setting where the learner can choose to intervene on a set of random variables related by a causal graph.
The learner's goal is to quickly find the intervention, among all interventions on observable variables, that maximizes the expectation of an outcome variable.
arXiv Detail & Related papers (2023-06-13T15:43:04Z) - Bivariate Causal Discovery using Bayesian Model Selection [11.726586969589]
We show how to incorporate causal assumptions within the Bayesian framework.
This enables us to construct models with realistic assumptions.
We then outperform previous methods on a wide range of benchmark datasets.
arXiv Detail & Related papers (2023-06-05T14:51:05Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - From Optimality to Robustness: Dirichlet Sampling Strategies in
Stochastic Bandits [0.0]
We study a generic Dirichlet Sampling (DS) algorithm, based on pairwise comparisons of empirical indices computed with re-sampling of the arms' observations.
We show that different variants of this strategy achieve provably optimal regret guarantees when the distributions are bounded and logarithmic regret for semi-bounded distributions with a mild quantile condition.
arXiv Detail & Related papers (2021-11-18T14:34:21Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Universal and data-adaptive algorithms for model selection in linear
contextual bandits [52.47796554359261]
We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem.
We introduce new algorithms that explore in a data-adaptive manner and provide guarantees of the form $mathcalO(dalpha T1- alpha)$.
Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.
arXiv Detail & Related papers (2021-11-08T18:05:35Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - Causal Expectation-Maximisation [70.45873402967297]
We show that causal inference is NP-hard even in models characterised by polytree-shaped graphs.
We introduce the causal EM algorithm to reconstruct the uncertainty about the latent variables from data about categorical manifest variables.
We argue that there appears to be an unnoticed limitation to the trending idea that counterfactual bounds can often be computed without knowledge of the structural equations.
arXiv Detail & Related papers (2020-11-04T10:25:13Z) - A Causal Direction Test for Heterogeneous Populations [10.653162005300608]
Most causal models assume a single homogeneous population, an assumption that may fail to hold in many applications.
We show that when the homogeneity assumption is violated, causal models developed based on such assumption can fail to identify the correct causal direction.
We propose an adjustment to a commonly used causal direction test statistic by using a $k$-means type clustering algorithm.
arXiv Detail & Related papers (2020-06-08T18:59:14Z) - Inference for Batched Bandits [9.468593929311867]
We develop methods for inference on data collected in batches using a bandit algorithm.
We first prove that the ordinary least squares estimator (OLS) is notally normal on data collected using standard bandit algorithms when there is no unique optimal arm.
Second, we introduce the Batched OLS estimator (BOLS) that we prove is (1) normal on data collected from both multi-arm and contextual bandits and (2) robust to non-stationarity in the baseline reward.
arXiv Detail & Related papers (2020-02-08T18:59:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.