Causal Bandits with General Causal Models and Interventions
- URL: http://arxiv.org/abs/2403.00233v1
- Date: Fri, 1 Mar 2024 02:28:49 GMT
- Title: Causal Bandits with General Causal Models and Interventions
- Authors: Zirui Yan, Dennis Wei, Dmitriy Katz-Rogozhnikov, Prasanna Sattigeri,
Ali Tajer
- Abstract summary: This paper considers causal bandits (CBs) for the sequential design of interventions in a causal system.
The objective is to optimize a reward function via minimizing a measure of cumulative regret with respect to the best sequence of interventions in hindsight.
- Score: 38.112806687145344
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper considers causal bandits (CBs) for the sequential design of
interventions in a causal system. The objective is to optimize a reward
function via minimizing a measure of cumulative regret with respect to the best
sequence of interventions in hindsight. The paper advances the results on CBs
in three directions. First, the structural causal models (SCMs) are assumed to
be unknown and drawn arbitrarily from a general class $\mathcal{F}$ of
Lipschitz-continuous functions. Existing results are often focused on
(generalized) linear SCMs. Second, the interventions are assumed to be
generalized soft with any desired level of granularity, resulting in an
infinite number of possible interventions. The existing literature, in
contrast, generally adopts atomic and hard interventions. Third, we provide
general upper and lower bounds on regret. The upper bounds subsume (and
improve) known bounds for special cases. The lower bounds are generally
hitherto unknown. These bounds are characterized as functions of the (i) graph
parameters, (ii) eluder dimension of the space of SCMs, denoted by
$\operatorname{dim}(\mathcal{F})$, and (iii) the covering number of the
function space, denoted by ${\rm cn}(\mathcal{F})$. Specifically, the
cumulative achievable regret over horizon $T$ is $\mathcal{O}(K
d^{L-1}\sqrt{T\operatorname{dim}(\mathcal{F}) \log({\rm cn}(\mathcal{F}))})$,
where $K$ is related to the Lipschitz constants, $d$ is the graph's maximum
in-degree, and $L$ is the length of the longest causal path. The upper bound is
further refined for special classes of SCMs (neural network, polynomial, and
linear), and their corresponding lower bounds are provided.
Related papers
- Linear Causal Bandits: Unknown Graph and Soft Interventions [18.412177974475526]
designing causal bandit algorithms depends on two central categories of assumptions.
The problem in its general form, i.e., unknown graph and unknown intervention models, remains open.
This paper addresses this problem and establishes that in a graph with $N$ nodes, maximum in-degree $d$ and maximum causal path length $L$, after $T$ interaction rounds the regret upper bound scales.
arXiv Detail & Related papers (2024-11-04T18:50:39Z) - Improved Bound for Robust Causal Bandits with Linear Models [16.60875994745622]
This paper investigates the robustness of causal bandits in the face of temporal model fluctuations.
The proposed algorithm achieves nearly optimal $tildemathcalO(sqrtT)$ regret when $C$ is $o(sqrtT)$.
arXiv Detail & Related papers (2024-05-13T14:41:28Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Robust Causal Bandits for Linear Models [20.028245872662843]
Sequential design of experiments for optimizing a reward function in causal systems can be effectively modeled by the sequential design of interventions in causal bandits (CBs)
This paper addresses the robustness of CBs to such model fluctuations.
Cumulative regret is adopted as the design criteria, based on which the objective is to design a sequence of interventions that incur the smallest cumulative regret with respect to an oracle aware of the entire causal model and its fluctuations.
arXiv Detail & Related papers (2023-10-30T17:58:01Z) - Private Isotonic Regression [54.32252900997422]
We consider the problem of isotonic regression over a partially ordered set (poset) $mathcalX$ and for any Lipschitz loss function.
We obtain a pure-DP algorithm that has an expected excess empirical risk of roughly $mathrmwidth(mathcalX) cdot log|mathcalX| / n$, where $mathrmwidth(mathcalX)$ is the width of the poset.
We show that the bounds above are essentially the best that can be
arXiv Detail & Related papers (2022-10-27T05:08:07Z) - Causal Bandits for Linear Structural Equation Models [58.2875460517691]
This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model.
It is assumed that the graph's structure is known and has $N$ nodes.
Two algorithms are proposed for the frequentist (UCB-based) and Bayesian settings.
arXiv Detail & Related papers (2022-08-26T16:21:31Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z) - Hybrid Stochastic-Deterministic Minibatch Proximal Gradient:
Less-Than-Single-Pass Optimization with Nearly Optimal Generalization [83.80460802169999]
We show that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
For loss factors, we prove that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
arXiv Detail & Related papers (2020-09-18T02:18:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.