Causal Bandits: The Pareto Optimal Frontier of Adaptivity, a Reduction to Linear Bandits, and Limitations around Unknown Marginals
- URL: http://arxiv.org/abs/2407.00950v1
- Date: Mon, 1 Jul 2024 04:12:15 GMT
- Title: Causal Bandits: The Pareto Optimal Frontier of Adaptivity, a Reduction to Linear Bandits, and Limitations around Unknown Marginals
- Authors: Ziyi Liu, Idan Attias, Daniel M. Roy,
- Abstract summary: We prove upper and matching lower bounds on the possible trade-offs in the performance of learning in conditionally benign and arbitrary environments.
We are the first to obtain instance-dependent bounds for causal bandits, by reducing the problem to the linear bandit setting.
- Score: 28.94461817548213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we investigate the problem of adapting to the presence or absence of causal structure in multi-armed bandit problems. In addition to the usual reward signal, we assume the learner has access to additional variables, observed in each round after acting. When these variables $d$-separate the action from the reward, existing work in causal bandits demonstrates that one can achieve strictly better (minimax) rates of regret (Lu et al., 2020). Our goal is to adapt to this favorable "conditionally benign" structure, if it is present in the environment, while simultaneously recovering worst-case minimax regret, if it is not. Notably, the learner has no prior knowledge of whether the favorable structure holds. In this paper, we establish the Pareto optimal frontier of adaptive rates. We prove upper and matching lower bounds on the possible trade-offs in the performance of learning in conditionally benign and arbitrary environments, resolving an open question raised by Bilodeau et al. (2022). Furthermore, we are the first to obtain instance-dependent bounds for causal bandits, by reducing the problem to the linear bandit setting. Finally, we examine the common assumption that the marginal distributions of the post-action contexts are known and show that a nontrivial estimate is necessary for better-than-worst-case minimax rates.
Related papers
- Partial Structure Discovery is Sufficient for No-regret Learning in Causal Bandits [7.064432289838905]
Current works often assume the causal graph is known, which may not always be available a priori.
We focus on the causal bandit problem in scenarios where the underlying causal graph is unknown and may include latent confounders.
We formally characterize the set of necessary and sufficient latent confounders one needs to detect or learn to ensure that all possibly optimal arms are identified correctly.
arXiv Detail & Related papers (2024-11-06T16:59:11Z) - Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting [67.1631453378926]
Graph-Triggered Bandits is a framework to generalize rested and restless bandits.
In this work, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs.
arXiv Detail & Related papers (2024-09-09T18:23:07Z) - Non-stationary Bandits with Knapsacks [6.2006721306998065]
We study the problem of bandits with knapsacks (BwK) in a non-stationary environment.
We employ both non-stationarity measures to derive upper and lower bounds for the problem.
arXiv Detail & Related papers (2022-05-25T01:22:36Z) - Versatile Dueling Bandits: Best-of-both-World Analyses for Online
Learning from Preferences [28.79598714109439]
We study the problem of $K$-armed dueling bandit for both and adversarial environments.
We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits.
Our algorithm is also the first to achieve an optimal $O(sum_i = 1K fraclog TDelta_i)$ regret bound against the Condorcet-winner benchmark.
arXiv Detail & Related papers (2022-02-14T13:37:23Z) - On Slowly-varying Non-stationary Bandits [25.305949034527202]
We consider dynamic regret in non-stationary bandits with a slowly varying property.
We establish the first instance-dependent regret upper bound for slowly varying non-stationary bandits.
We show that our algorithm is essentially minimax optimal.
arXiv Detail & Related papers (2021-10-25T12:56:19Z) - Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased.
We propose a novel approach for dueling bandits based on information-directed sampling (IDS)
Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z) - Effects of Model Misspecification on Bayesian Bandits: Case Studies in
UX Optimization [8.704145252476705]
We present a novel formulation as a restless, sleeping bandit with unobserved confounders plus optional stopping.
Case studies show how common misspecifications can lead to sub-optimal rewards.
We also present the first model to exploit cointegration in a restless bandit, demonstrating that finite regret and fast and consistent optional stopping are possible.
arXiv Detail & Related papers (2020-10-07T14:34:28Z) - Instance-Dependent Complexity of Contextual Bandits and Reinforcement
Learning: A Disagreement-Based Perspective [104.67295710363679]
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm.
We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds.
We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case.
arXiv Detail & Related papers (2020-10-07T01:33:06Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z) - Adaptive Discretization for Adversarial Lipschitz Bandits [85.39106976861702]
Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces.
A central theme here is the adaptive discretization of the action space, which gradually zooms in'' on the more promising regions.
We provide the first algorithm for adaptive discretization in the adversarial version, and derive instance-dependent regret bounds.
arXiv Detail & Related papers (2020-06-22T16:06:25Z) - Bias no more: high-probability data-dependent regret bounds for
adversarial bandits and MDPs [48.44657553192801]
We develop a new approach to obtaining high probability regret bounds for online learning with bandit feedback against an adaptive adversary.
Our approach relies on a simple increasing learning rate schedule, together with the help of logarithmically homogeneous self-concordant barriers and a strengthened Freedman's inequality.
arXiv Detail & Related papers (2020-06-14T22:09:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.