Causal Bandits for Linear Structural Equation Models
- URL: http://arxiv.org/abs/2208.12764v3
- Date: Sat, 1 Apr 2023 03:09:01 GMT
- Title: Causal Bandits for Linear Structural Equation Models
- Authors: Burak Varici, Karthikeyan Shanmugam, Prasanna Sattigeri, and Ali Tajer
- Abstract summary: This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model.
It is assumed that the graph's structure is known and has $N$ nodes.
Two algorithms are proposed for the frequentist (UCB-based) and Bayesian settings.
- Score: 58.2875460517691
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the problem of designing an optimal sequence of
interventions in a causal graphical model to minimize cumulative regret with
respect to the best intervention in hindsight. This is, naturally, posed as a
causal bandit problem. The focus is on causal bandits for linear structural
equation models (SEMs) and soft interventions. It is assumed that the graph's
structure is known and has $N$ nodes. Two linear mechanisms, one soft
intervention and one observational, are assumed for each node, giving rise to
$2^N$ possible interventions. Majority of the existing causal bandit algorithms
assume that at least the interventional distributions of the reward node's
parents are fully specified. However, there are $2^N$ such distributions (one
corresponding to each intervention), acquiring which becomes prohibitive even
in moderate-sized graphs. This paper dispenses with the assumption of knowing
these distributions or their marginals. Two algorithms are proposed for the
frequentist (UCB-based) and Bayesian (Thompson Sampling-based) settings. The
key idea of these algorithms is to avoid directly estimating the $2^N$ reward
distributions and instead estimate the parameters that fully specify the SEMs
(linear in $N$) and use them to compute the rewards. In both algorithms, under
boundedness assumptions on noise and the parameter space, the cumulative
regrets scale as $\tilde{\cal O} (d^{L+\frac{1}{2}} \sqrt{NT})$, where $d$ is
the graph's maximum degree, and $L$ is the length of its longest causal path.
Additionally, a minimax lower of $\Omega(d^{\frac{L}{2}-2}\sqrt{T})$ is
presented, which suggests that the achievable and lower bounds conform in their
scaling behavior with respect to the horizon $T$ and graph parameters $d$ and
$L$.
Related papers
- Linear Causal Bandits: Unknown Graph and Soft Interventions [18.412177974475526]
designing causal bandit algorithms depends on two central categories of assumptions.
The problem in its general form, i.e., unknown graph and unknown intervention models, remains open.
This paper addresses this problem and establishes that in a graph with $N$ nodes, maximum in-degree $d$ and maximum causal path length $L$, after $T$ interaction rounds the regret upper bound scales.
arXiv Detail & Related papers (2024-11-04T18:50:39Z) - Improved Bound for Robust Causal Bandits with Linear Models [16.60875994745622]
This paper investigates the robustness of causal bandits in the face of temporal model fluctuations.
The proposed algorithm achieves nearly optimal $tildemathcalO(sqrtT)$ regret when $C$ is $o(sqrtT)$.
arXiv Detail & Related papers (2024-05-13T14:41:28Z) - Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework.
To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework.
We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z) - Robust Causal Bandits for Linear Models [20.028245872662843]
Sequential design of experiments for optimizing a reward function in causal systems can be effectively modeled by the sequential design of interventions in causal bandits (CBs)
This paper addresses the robustness of CBs to such model fluctuations.
Cumulative regret is adopted as the design criteria, based on which the objective is to design a sequence of interventions that incur the smallest cumulative regret with respect to an oracle aware of the entire causal model and its fluctuations.
arXiv Detail & Related papers (2023-10-30T17:58:01Z) - Learning Good Interventions in Causal Graphs via Covering [12.512036656559685]
An optimal intervention in $A$ is one that maximizes the expected value for a designated reward variable in the graph.
We establish a simple regret guarantee of $widetildeO(sqrtN/T)$ for simple regret.
We also go beyond prior work and achieve a simple regret guarantee for causal graphs with unobserved variables.
arXiv Detail & Related papers (2023-05-08T11:35:22Z) - Communication-Constrained Bandits under Additive Gaussian Noise [111.06688156723018]
We study a distributed multi-armed bandit where a client supplies the learner with communication-constrained feedback.
We propose a multi-phase bandit algorithm, $mathttUEtext-UCB++$, which matches this lower bound to a minor additive factor.
arXiv Detail & Related papers (2023-04-25T09:31:20Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Causal Bandits on General Graphs [1.4502611532302039]
We study the problem of determining the best intervention in a Causal Bayesian Network (CBN) specified only by its causal graph.
We propose a simple regret minimization algorithm that takes as input a semi-Markovian causal graph with atomic interventions and possibly unobservable variables.
Our results indicate that the simple regret guarantee of our proposed algorithm can only be improved by considering more nuanced structural restrictions on the causal graph.
arXiv Detail & Related papers (2021-07-06T17:29:45Z) - Learning Near Optimal Policies with Low Inherent Bellman Error [115.16037976819331]
We study the exploration problem with approximate linear action-value functions in episodic reinforcement learning.
We show that exploration is possible using only emphbatch assumptions with an algorithm that achieves the optimal statistical rate for the setting we consider.
arXiv Detail & Related papers (2020-02-29T02:02:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.