Learning Good Interventions in Causal Graphs via Covering
- URL: http://arxiv.org/abs/2305.04638v1
- Date: Mon, 8 May 2023 11:35:22 GMT
- Title: Learning Good Interventions in Causal Graphs via Covering
- Authors: Ayush Sawarni, Rahul Madhavan, Gaurav Sinha, and Siddharth Barman
- Abstract summary: An optimal intervention in $A$ is one that maximizes the expected value for a designated reward variable in the graph.
We establish a simple regret guarantee of $widetildeO(sqrtN/T)$ for simple regret.
We also go beyond prior work and achieve a simple regret guarantee for causal graphs with unobserved variables.
- Score: 12.512036656559685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the causal bandit problem that entails identifying a near-optimal
intervention from a specified set $A$ of (possibly non-atomic) interventions
over a given causal graph. Here, an optimal intervention in ${A}$ is one that
maximizes the expected value for a designated reward variable in the graph, and
we use the standard notion of simple regret to quantify near optimality.
Considering Bernoulli random variables and for causal graphs on $N$ vertices
with constant in-degree, prior work has achieved a worst case guarantee of
$\widetilde{O} (N/\sqrt{T})$ for simple regret. The current work utilizes the
idea of covering interventions (which are not necessarily contained within
${A}$) and establishes a simple regret guarantee of
$\widetilde{O}(\sqrt{N/T})$. Notably, and in contrast to prior work, our simple
regret bound depends only on explicit parameters of the problem instance. We
also go beyond prior work and achieve a simple regret guarantee for causal
graphs with unobserved variables. Further, we perform experiments to show
improvements over baselines in this setting.
Related papers
- Linear Causal Bandits: Unknown Graph and Soft Interventions [18.412177974475526]
designing causal bandit algorithms depends on two central categories of assumptions.
The problem in its general form, i.e., unknown graph and unknown intervention models, remains open.
This paper addresses this problem and establishes that in a graph with $N$ nodes, maximum in-degree $d$ and maximum causal path length $L$, after $T$ interaction rounds the regret upper bound scales.
arXiv Detail & Related papers (2024-11-04T18:50:39Z) - Confounded Budgeted Causal Bandits [28.199741662190203]
We study the problem of learning 'good' interventions in an environment modeled by its underlying causal graph.
Good interventions refer to interventions that maximize rewards.
We propose an algorithm to minimize the cumulative regret in general causal graphs.
arXiv Detail & Related papers (2024-01-15T10:26:13Z) - Online Learning with Feedback Graphs: The True Shape of Regret [82.00098840619847]
We prove that the minimax regret is proportional to $R*$ for any graph and time horizon $T$.
Introducing an intricate exploration strategy, we define the mainAlgorithm algorithm that achieves the minimax optimal regret bound.
arXiv Detail & Related papers (2023-06-05T15:35:00Z) - Estimating Optimal Policy Value in General Linear Contextual Bandits [50.008542459050155]
In many bandit problems, the maximal reward achievable by a policy is often unknown in advance.
We consider the problem of estimating the optimal policy value in the sublinear data regime before the optimal policy is even learnable.
We present a more practical, computationally efficient algorithm that estimates a problem-dependent upper bound on $V*$.
arXiv Detail & Related papers (2023-02-19T01:09:24Z) - Improved High-Probability Regret for Adversarial Bandits with
Time-Varying Feedback Graphs [62.52390282012508]
We study high-probability regret bounds for adversarial $K$-armed bandits with time-varying feedback graphs over $T$ rounds.
We develop an algorithm that achieves the optimal regret $widetildemathcalO((sum_t=1Talpha_t)1/2+max_tin[T]alpha_t]$ with high probability.
We also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs.
arXiv Detail & Related papers (2022-10-04T04:36:15Z) - Causal Bandits for Linear Structural Equation Models [58.2875460517691]
This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model.
It is assumed that the graph's structure is known and has $N$ nodes.
Two algorithms are proposed for the frequentist (UCB-based) and Bayesian settings.
arXiv Detail & Related papers (2022-08-26T16:21:31Z) - The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret
and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL)
We focus on learning with general function classes and general model classes.
We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z) - Causal Bandits on General Graphs [1.4502611532302039]
We study the problem of determining the best intervention in a Causal Bayesian Network (CBN) specified only by its causal graph.
We propose a simple regret minimization algorithm that takes as input a semi-Markovian causal graph with atomic interventions and possibly unobservable variables.
Our results indicate that the simple regret guarantee of our proposed algorithm can only be improved by considering more nuanced structural restrictions on the causal graph.
arXiv Detail & Related papers (2021-07-06T17:29:45Z) - Intervention Efficient Algorithms for Approximate Learning of Causal
Graphs [22.401163479802094]
We study the problem of learning the causal relationships between a set of observed variables in the presence of latents.
Our goal is to recover the directions of all causal or ancestral relations in $G$, via a minimum cost set of interventions.
Our algorithms combine work on efficient intervention design and the design of low-cost separating set systems.
arXiv Detail & Related papers (2020-12-27T17:08:46Z) - Near-optimal Regret Bounds for Stochastic Shortest Path [63.029132134792555]
shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost.
We show that any learning algorithm must have at least $Omega(B_star sqrt|S| |A| K)$ regret in the worst case.
arXiv Detail & Related papers (2020-02-23T09:10:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.