An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits
- URL: http://arxiv.org/abs/2311.05794v3
- Date: Fri, 14 Jun 2024 20:24:39 GMT
- Title: An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits
- Authors: Biyonka Liang, Iavor Bojinov,
- Abstract summary: This paper proposes a new experimental design for multi-armed bandit (MAB) algorithms that enables anytime valid inference on the Average Treatment Effect (ATE)
We show that the MAD achieves finite-sample anytime-validity while accurately and precisely estimating the ATE, all without incurring significant losses in reward compared to standard bandit designs.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Experimentation is crucial for managers to rigorously quantify the value of a change and determine if it leads to a statistically significant improvement over the status quo, thus augmenting their decision-making. Many companies now mandate that all changes undergo experimentation, presenting two challenges: (1) reducing the risk/cost of experimentation by minimizing the proportion of customers assigned to the inferior treatment and (2) increasing the experimentation velocity by enabling managers to stop experiments as soon as results are statistically significant. This paper simultaneously addresses both challenges by proposing the Mixture Adaptive Design (MAD), a new experimental design for multi-armed bandit (MAB) algorithms that enables anytime valid inference on the Average Treatment Effect (ATE) for any MAB algorithm. Intuitively, the MAB "mixes" any bandit algorithm with a Bernoulli design such that at each time step, the probability that a customer is assigned via the Bernoulli design is controlled by a user-specified deterministic sequence that can converge to zero. The sequence enables managers to directly and interpretably control the trade-off between regret minimization and inferential precision. Under mild conditions on the rate the sequence converges to zero, we provide a confidence sequence that is asymptotically anytime valid and demonstrate that the MAD is guaranteed to have a finite stopping time in the presence of a true non-zero ATE. Hence, the MAD allows managers to stop experiments early when a significant ATE is detected while ensuring valid inference, enhancing both the efficiency and reliability of adaptive experiments. Empirically, we demonstrate that the MAD achieves finite-sample anytime-validity while accurately and precisely estimating the ATE, all without incurring significant losses in reward compared to standard bandit designs.
Related papers
- VALID: a Validated Algorithm for Learning in Decentralized Networks with Possible Adversarial Presence [13.612214163974459]
We introduce the paradigm of validated decentralized learning for undirected networks with heterogeneous data.
VALID protocol is the first to achieve a validated learning guarantee.
Remarkably, VALID retains optimal performance metrics in adversary-free environments.
arXiv Detail & Related papers (2024-05-12T15:55:43Z) - Mitigating LLM Hallucinations via Conformal Abstention [70.83870602967625]
We develop a principled procedure for determining when a large language model should abstain from responding in a general domain.
We leverage conformal prediction techniques to develop an abstention procedure that benefits from rigorous theoretical guarantees on the hallucination rate (error rate)
Experimentally, our resulting conformal abstention method reliably bounds the hallucination rate on various closed-book, open-domain generative question answering datasets.
arXiv Detail & Related papers (2024-04-04T11:32:03Z) - Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes [44.974100402600165]
We study evaluating a policy under best- and worst-case perturbations to a Markov decision process (MDP)
This is an important problem when there is the possibility of a shift between historical and future environments.
We propose a perturbation model that can modify transition kernel densities up to a given multiplicative factor or its reciprocal.
arXiv Detail & Related papers (2024-03-29T18:11:49Z) - Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [55.17761802332469]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample.
Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications.
We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z) - Equal Opportunity of Coverage in Fair Regression [50.76908018786335]
We study fair machine learning (ML) under predictive uncertainty to enable reliable and trustworthy decision-making.
We propose Equal Opportunity of Coverage (EOC) that aims to achieve two properties: (1) coverage rates for different groups with similar outcomes are close, and (2) the coverage rate for the entire population remains at a predetermined level.
arXiv Detail & Related papers (2023-11-03T21:19:59Z) - Batch Bayesian Optimization for Replicable Experimental Design [56.64902148159355]
Many real-world design problems evaluate multiple experimental conditions in parallel and replicate each condition multiple times due to large and heteroscedastic observation noise.
We propose the Batch Thompson Sampling for Replicable Experimental Design framework, which encompasses three algorithms.
We show the effectiveness of our algorithms in two practical real-world applications: precision agriculture and AutoML.
arXiv Detail & Related papers (2023-11-02T12:46:03Z) - Distributional Shift-Aware Off-Policy Interval Estimation: A Unified
Error Quantification Framework [8.572441599469597]
We study high-confidence off-policy evaluation in the context of infinite-horizon Markov decision processes.
The objective is to establish a confidence interval (CI) for the target policy value using only offline data pre-collected from unknown behavior policies.
We show that our algorithm is sample-efficient, error-robust, and provably convergent even in non-linear function approximation settings.
arXiv Detail & Related papers (2023-09-23T06:35:44Z) - Adapting to Continuous Covariate Shift via Online Density Ratio Estimation [64.8027122329609]
Dealing with distribution shifts is one of the central challenges for modern machine learning.
We propose an online method that can appropriately reuse historical information.
Our density ratio estimation method is proven to perform well by enjoying a dynamic regret bound.
arXiv Detail & Related papers (2023-02-06T04:03:33Z) - Double Robust Bayesian Inference on Average Treatment Effects [2.7632791497072553]
We propose a double robust Bayesian inference procedure on the average treatment effect (ATE) under unconness.
We prove equivalence of our Bayesian procedure and efficient frequentist ATE estimators by establishing a new semiparametric robustness theorem under double Bernstein-von Mises theorem.
In simulations, our double robust Bayesian procedure leads to significant bias reduction and more accurate coverage of confidence intervals compared to existing frequentist methods.
arXiv Detail & Related papers (2022-11-29T15:32:25Z) - Conformal Inference of Counterfactuals and Individual Treatment Effects [6.810856082577402]
We propose a conformal inference-based approach that can produce reliable interval estimates for counterfactuals and individual treatment effects.
Existing methods suffer from a significant coverage deficit even in simple models.
arXiv Detail & Related papers (2020-06-11T01:03:32Z) - An Information Bottleneck Approach for Controlling Conciseness in
Rationale Extraction [84.49035467829819]
We show that it is possible to better manage this trade-off by optimizing a bound on the Information Bottleneck (IB) objective.
Our fully unsupervised approach jointly learns an explainer that predicts sparse binary masks over sentences, and an end-task predictor that considers only the extracted rationale.
arXiv Detail & Related papers (2020-05-01T23:26:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.