Budgeted and Non-budgeted Causal Bandits
- URL: http://arxiv.org/abs/2012.07058v1
- Date: Sun, 13 Dec 2020 13:31:14 GMT
- Title: Budgeted and Non-budgeted Causal Bandits
- Authors: Vineet Nair, Vishakha Patil, Gaurav Sinha
- Abstract summary: Learning good interventions in a causal graph can be modelled as a multi-armed bandit problem with side-information.
We propose an algorithm to minimize simple regret that optimally trades-off observations and interventions based on the cost of intervention.
Our results are experimentally validated and compared to the best-known bounds in the current literature.
- Score: 2.9005223064604078
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Learning good interventions in a causal graph can be modelled as a stochastic
multi-armed bandit problem with side-information. First, we study this problem
when interventions are more expensive than observations and a budget is
specified. If there are no backdoor paths from an intervenable node to the
reward node then we propose an algorithm to minimize simple regret that
optimally trades-off observations and interventions based on the cost of
intervention. We also propose an algorithm that accounts for the cost of
interventions, utilizes causal side-information, and minimizes the expected
cumulative regret without exceeding the budget. Our cumulative-regret
minimization algorithm performs better than standard algorithms that do not
take side-information into account. Finally, we study the problem of learning
best interventions without budget constraint in general graphs and give an
algorithm that achieves constant expected cumulative regret in terms of the
instance parameters when the parent distribution of the reward variable for
each intervention is known. Our results are experimentally validated and
compared to the best-known bounds in the current literature.
Related papers
- Confounded Budgeted Causal Bandits [28.199741662190203]
We study the problem of learning 'good' interventions in an environment modeled by its underlying causal graph.
Good interventions refer to interventions that maximize rewards.
We propose an algorithm to minimize the cumulative regret in general causal graphs.
arXiv Detail & Related papers (2024-01-15T10:26:13Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Adaptivity Complexity for Causal Graph Discovery [7.424262881242935]
We study the problem of $r$-adaptivity, where the algorithm designer recovers the causal graph under a total of $r$ sequential rounds.
We provide a $r$-adaptive algorithm that achieves $O(minr,log n cdot n1/minr,log n)$ approximation with respect to the verification number.
arXiv Detail & Related papers (2023-06-09T09:49:16Z) - Stochastic Direct Search Method for Blind Resource Allocation [6.574808513848414]
We study direct search (also known as pattern search) methods for linearly constrained and derivative-free optimization.
We show that direct search methods achieves finite regret in the deterministic and unconstrained case.
We propose a simple extension of direct search that achieves a regret upper-bound of the order of $T2/3$.
arXiv Detail & Related papers (2022-10-11T07:40:45Z) - Quantization for decentralized learning under subspace constraints [61.59416703323886]
We consider decentralized optimization problems where agents have individual cost functions to minimize subject to subspace constraints.
We propose and study an adaptive decentralized strategy where the agents employ differential randomized quantizers to compress their estimates.
The analysis shows that, under some general conditions on the quantization noise, the strategy is stable both in terms of mean-square error and average bit rate.
arXiv Detail & Related papers (2022-09-16T09:38:38Z) - Safe Online Bid Optimization with Return-On-Investment and Budget
Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems.
We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations.
More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z) - Causal Bandits on General Graphs [1.4502611532302039]
We study the problem of determining the best intervention in a Causal Bayesian Network (CBN) specified only by its causal graph.
We propose a simple regret minimization algorithm that takes as input a semi-Markovian causal graph with atomic interventions and possibly unobservable variables.
Our results indicate that the simple regret guarantee of our proposed algorithm can only be improved by considering more nuanced structural restrictions on the causal graph.
arXiv Detail & Related papers (2021-07-06T17:29:45Z) - Stochastic Hard Thresholding Algorithms for AUC Maximization [49.00683387735522]
We develop a hard thresholding algorithm for AUC in distributiond classification.
We conduct experiments to show the efficiency and effectiveness of the proposed algorithms.
arXiv Detail & Related papers (2020-11-04T16:49:29Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Excursion Search for Constrained Bayesian Optimization under a Limited
Budget of Failures [62.41541049302712]
We propose a novel decision maker grounded in control theory that controls the amount of risk we allow in the search as a function of a given budget of failures.
Our algorithm uses the failures budget more efficiently in a variety of optimization experiments, and generally achieves lower regret, than state-of-the-art methods.
arXiv Detail & Related papers (2020-05-15T09:54:09Z) - Budget-Constrained Bandits over General Cost and Reward Distributions [32.63624728528415]
We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return.
We show that if moments of order $(2+gamma)$ for some $gamma > 0$ exist for all cost-reward pairs, $O(log B)$ regret is achievable for a budget $B>0$.
arXiv Detail & Related papers (2020-02-29T23:50:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.