Causal Bandits with Unknown Graph Structure
- URL: http://arxiv.org/abs/2106.02988v1
- Date: Sat, 5 Jun 2021 23:41:38 GMT
- Title: Causal Bandits with Unknown Graph Structure
- Authors: Yangyi Lu, Amirhossein Meisami, Ambuj Tewari
- Abstract summary: We develop novel causal bandit algorithms without knowing the causal graph.
Our algorithms work well for causal trees, causal forests and a general class of causal graphs.
- Score: 24.58691421788476
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In causal bandit problems, the action set consists of interventions on
variables of a causal graph. Several researchers have recently studied such
bandit problems and pointed out their practical applications. However, all
existing works rely on a restrictive and impractical assumption that the
learner is given full knowledge of the causal graph structure upfront. In this
paper, we develop novel causal bandit algorithms without knowing the causal
graph. Our algorithms work well for causal trees, causal forests and a general
class of causal graphs. The regret guarantees of our algorithms greatly improve
upon those of standard multi-armed bandit (MAB) algorithms under mild
conditions. Lastly, we prove our mild conditions are necessary: without them
one cannot do better than standard MAB bandit algorithms.
Related papers
- Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting [67.1631453378926]
Graph-Triggered Bandits is a framework to generalize rested and restless bandits.
In this work, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs.
arXiv Detail & Related papers (2024-09-09T18:23:07Z) - Zero-Inflated Bandits [11.60342504007264]
We study zero-inflated bandits, where the reward is modeled as a classic semi-parametric distribution called zero-inflated distribution.
We derive the regret bounds under both multi-armed bandits with general reward assumptions and contextual generalized linear bandit with sub-Gaussian rewards.
In many settings, the regret rates of our algorithms are either minimax optimal or state-of-the-art.
arXiv Detail & Related papers (2023-12-25T03:13:21Z) - Bandit Social Learning: Exploration under Myopic Behavior [58.75758600464338]
We study social learning dynamics motivated by reviews on online platforms.
Agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration.
We derive stark learning failures for any such behavior, and provide matching positive results.
arXiv Detail & Related papers (2023-02-15T01:57:57Z) - Causal Bandits without Graph Learning [28.021500949026766]
We develop an efficient algorithm for finding the parent node of the reward node using atomic interventions.
We extend our algorithm to the case when the reward node has multiple parents.
arXiv Detail & Related papers (2023-01-26T20:27:14Z) - Pure Exploration of Causal Bandits [9.77519365079468]
Causal bandit problem integrates causal inference with multi-armed bandits.
Online learning task: given a causal graph with unknown causal inference distributions, we can choose to either intervene one variable or do no intervention.
We provide first gap-dependent fully adaptive pure exploration algorithms on three types of causal models.
arXiv Detail & Related papers (2022-06-16T02:19:37Z) - Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave.
Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality.
We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z) - Causal Bandits on General Graphs [1.4502611532302039]
We study the problem of determining the best intervention in a Causal Bayesian Network (CBN) specified only by its causal graph.
We propose a simple regret minimization algorithm that takes as input a semi-Markovian causal graph with atomic interventions and possibly unobservable variables.
Our results indicate that the simple regret guarantee of our proposed algorithm can only be improved by considering more nuanced structural restrictions on the causal graph.
arXiv Detail & Related papers (2021-07-06T17:29:45Z) - Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms.
Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z) - Approximation Theory Based Methods for RKHS Bandits [9.391375268580806]
The RKHS bandit problem is an online optimization problem of non-linear functions with noisy feedback.
There is no general algorithm for the adversarial RKHS bandit problem.
We propose efficient algorithms for the RKHS bandit problem and the first general algorithm for the adversarial RKHS bandit problem.
arXiv Detail & Related papers (2020-10-23T05:14:21Z) - Corralling Stochastic Bandit Algorithms [54.10645564702416]
We show that the regret of the corralling algorithms is no worse than that of the best algorithm containing the arm with the highest reward.
We show that the gap between the highest reward and other rewards depends on the gap between the highest reward and other rewards.
arXiv Detail & Related papers (2020-06-16T15:33:12Z) - A Novel Confidence-Based Algorithm for Structured Bandits [129.30402124516507]
We study finite-armed bandits where the rewards of each arm might be correlated to those of other arms.
We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem.
arXiv Detail & Related papers (2020-05-23T19:52:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.