The Minimal Search Space for Conditional Causal Bandits
- URL: http://arxiv.org/abs/2502.06577v1
- Date: Mon, 10 Feb 2025 15:45:18 GMT
- Title: The Minimal Search Space for Conditional Causal Bandits
- Authors: Francisco N. F. Q. Simoes, Itai Feigenbaum, Mehdi Dastani, Thijs van Ommen,
- Abstract summary: Causal knowledge can be used to support decision-making problems.
This paper presents a graphical characterization of the minimal set of nodes guaranteed to contain the optimal conditional intervention.
We then propose an efficient algorithm with a time complexity of $O(|V| + |E|)$ to identify this minimal set of nodes.
- Score: 0.18124328823188351
- License:
- Abstract: Causal knowledge can be used to support decision-making problems. This has been recognized in the causal bandits literature, where a causal (multi-armed) bandit is characterized by a causal graphical model and a target variable. The arms are then interventions on the causal model, and rewards are samples of the target variable. Causal bandits were originally studied with a focus on hard interventions. We focus instead on cases where the arms are conditional interventions, which more accurately model many real-world decision-making problems by allowing the value of the intervened variable to be chosen based on the observed values of other variables. This paper presents a graphical characterization of the minimal set of nodes guaranteed to contain the optimal conditional intervention, which maximizes the expected reward. We then propose an efficient algorithm with a time complexity of $O(|V| + |E|)$ to identify this minimal set of nodes. We prove that the graphical characterization and the proposed algorithm are correct. Finally, we empirically demonstrate that our algorithm significantly prunes the search space and substantially accelerates convergence rates when integrated into standard multi-armed bandit algorithms.
Related papers
- Causal bandits with backdoor adjustment on unknown Gaussian DAGs [5.807183284468881]
We study the causal bandit problem when the graph structure is unknown.
We identify backdoor adjustment sets for each arm using sequentially generated experimental and observational data.
We develop a novel bandit algorithm, based on modified upper confidence bounds, to sequentially determine the optimal intervention.
arXiv Detail & Related papers (2025-02-04T05:18:35Z) - Optimal Multi-Objective Best Arm Identification with Fixed Confidence [62.36929749450298]
We consider a multi-armed bandit setting in which each arm yields an $M$-dimensional vector reward upon selection.
The end goal is to identify the best arm of em every objective in the shortest (expected) time subject to an upper bound on the probability of error.
We propose an algorithm that uses the novel idea of em surrogate proportions to sample the arms at each time step, eliminating the need to solve the max-min optimisation problem at each step.
arXiv Detail & Related papers (2025-01-23T12:28:09Z) - Partial Structure Discovery is Sufficient for No-regret Learning in Causal Bandits [7.064432289838905]
Current works often assume the causal graph is known, which may not always be available a priori.
We focus on the causal bandit problem in scenarios where the underlying causal graph is unknown and may include latent confounders.
We formally characterize the set of necessary and sufficient latent confounders one needs to detect or learn to ensure that all possibly optimal arms are identified correctly.
arXiv Detail & Related papers (2024-11-06T16:59:11Z) - Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting [67.1631453378926]
Graph-Triggered Bandits is a framework to generalize rested and restless bandits.
In this work, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs.
arXiv Detail & Related papers (2024-09-09T18:23:07Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Pure Exploration of Causal Bandits [9.77519365079468]
Causal bandit problem integrates causal inference with multi-armed bandits.
Online learning task: given a causal graph with unknown causal inference distributions, we can choose to either intervene one variable or do no intervention.
We provide first gap-dependent fully adaptive pure exploration algorithms on three types of causal models.
arXiv Detail & Related papers (2022-06-16T02:19:37Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Online Model Selection: a Rested Bandit Formulation [49.69377391589057]
We introduce and analyze a best arm identification problem in the rested bandit setting.
We define a novel notion of regret for this problem, where we compare to the policy that always plays the arm having the smallest expected loss at the end of the game.
Unlike known model selection efforts in the recent bandit literature, our algorithm exploits the specific structure of the problem to learn the unknown parameters of the expected loss function.
arXiv Detail & Related papers (2020-12-07T08:23:08Z) - Causal Bandits without prior knowledge using separating sets [3.1000291317725]
The Causal Bandit is a variant of the classic Bandit problem where an agent must identify the best action in a sequential decision-making process.
Methods proposed for this problem thus far in the literature rely on exact prior knowledge of the full causal graph.
We formulate new causal bandit algorithms that no longer necessarily rely on prior causal knowledge.
arXiv Detail & Related papers (2020-09-16T20:08:03Z) - Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a
Differentially Private Scheme [16.1694012177079]
We study the best-arm identification problem in multi-armed bandits with, potentially private rewards.
The goal is to identify the arm with the highest quantile at a fixed, prescribed level.
We show that our algorithm is $delta$-PAC and we characterize its sample complexity.
arXiv Detail & Related papers (2020-06-11T20:23:43Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.