Robust Causal Bandits for Linear Models
- URL: http://arxiv.org/abs/2310.19794v2
- Date: Mon, 4 Mar 2024 22:32:38 GMT
- Title: Robust Causal Bandits for Linear Models
- Authors: Zirui Yan, Arpan Mukherjee, Burak Var{\i}c{\i}, Ali Tajer
- Abstract summary: Sequential design of experiments for optimizing a reward function in causal systems can be effectively modeled by the sequential design of interventions in causal bandits (CBs)
This paper addresses the robustness of CBs to such model fluctuations.
Cumulative regret is adopted as the design criteria, based on which the objective is to design a sequence of interventions that incur the smallest cumulative regret with respect to an oracle aware of the entire causal model and its fluctuations.
- Score: 20.028245872662843
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequential design of experiments for optimizing a reward function in causal
systems can be effectively modeled by the sequential design of interventions in
causal bandits (CBs). In the existing literature on CBs, a critical assumption
is that the causal models remain constant over time. However, this assumption
does not necessarily hold in complex systems, which constantly undergo temporal
model fluctuations. This paper addresses the robustness of CBs to such model
fluctuations. The focus is on causal systems with linear structural equation
models (SEMs). The SEMs and the time-varying pre- and post-interventional
statistical models are all unknown. Cumulative regret is adopted as the design
criteria, based on which the objective is to design a sequence of interventions
that incur the smallest cumulative regret with respect to an oracle aware of
the entire causal model and its fluctuations. First, it is established that the
existing approaches fail to maintain regret sub-linearity with even a few
instances of model deviation. Specifically, when the number of instances with
model deviation is as few as $T^\frac{1}{2L}$, where $T$ is the time horizon
and $L$ is the longest causal path in the graph, the existing algorithms will
have linear regret in $T$. Next, a robust CB algorithm is designed, and its
regret is analyzed, where upper and information-theoretic lower bounds on the
regret are established. Specifically, in a graph with $N$ nodes and maximum
degree $d$, under a general measure of model deviation $C$, the cumulative
regret is upper bounded by $\tilde{\mathcal{O}}(d^{L-\frac{1}{2}}(\sqrt{NT} +
NC))$ and lower bounded by $\Omega(d^{\frac{L}{2}-2}\max\{\sqrt{T},d^2C\})$.
Comparing these bounds establishes that the proposed algorithm achieves nearly
optimal $\tilde{\mathcal{O}}(\sqrt{T})$ regret when $C$ is $o(\sqrt{T})$ and
maintains sub-linear regret for a broader range of $C$.
Related papers
- Linear Causal Bandits: Unknown Graph and Soft Interventions [18.412177974475526]
designing causal bandit algorithms depends on two central categories of assumptions.
The problem in its general form, i.e., unknown graph and unknown intervention models, remains open.
This paper addresses this problem and establishes that in a graph with $N$ nodes, maximum in-degree $d$ and maximum causal path length $L$, after $T$ interaction rounds the regret upper bound scales.
arXiv Detail & Related papers (2024-11-04T18:50:39Z) - Beyond Closure Models: Learning Chaotic-Systems via Physics-Informed Neural Operators [78.64101336150419]
Predicting the long-term behavior of chaotic systems is crucial for various applications such as climate modeling.
An alternative approach to such a full-resolved simulation is using a coarse grid and then correcting its errors through a temporalittext model.
We propose an alternative end-to-end learning approach using a physics-informed neural operator (PINO) that overcomes this limitation.
arXiv Detail & Related papers (2024-08-09T17:05:45Z) - Improved Bound for Robust Causal Bandits with Linear Models [16.60875994745622]
This paper investigates the robustness of causal bandits in the face of temporal model fluctuations.
The proposed algorithm achieves nearly optimal $tildemathcalO(sqrtT)$ regret when $C$ is $o(sqrtT)$.
arXiv Detail & Related papers (2024-05-13T14:41:28Z) - Settling the Sample Complexity of Online Reinforcement Learning [92.02082223856479]
We show how to achieve minimax-optimal regret without incurring any burn-in cost.
We extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances.
arXiv Detail & Related papers (2023-07-25T15:42:11Z) - Causal Bandits for Linear Structural Equation Models [58.2875460517691]
This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model.
It is assumed that the graph's structure is known and has $N$ nodes.
Two algorithms are proposed for the frequentist (UCB-based) and Bayesian settings.
arXiv Detail & Related papers (2022-08-26T16:21:31Z) - Provably Efficient Model-Free Constrained RL with Linear Function
Approximation [4.060731229044571]
We develop the first model-free, simulator-free algorithm that achieves a sublinear regret and a sublinear constraint violation even in large-scale systems.
Our results are achieved via novel adaptations of the standard LSVI-UCB algorithms.
arXiv Detail & Related papers (2022-06-23T17:54:31Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - Model-Based Reinforcement Learning with Value-Targeted Regression [48.92439657407732]
We focus on finite-horizon episodic RL where the transition model $P$ belongs to a known family of models $mathcalP$.
We derive a bound on the regret, which, in the special case of linear mixtures, the regret bound takes the form $tildemathcalO(dsqrtH3T)$.
arXiv Detail & Related papers (2020-06-01T17:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.