Related papers: An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits

An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits

URL: http://arxiv.org/abs/2311.05794v4
Date: Tue, 15 Oct 2024 15:25:03 GMT
Title: An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits
Authors: Biyonka Liang, Iavor Bojinov,
Abstract summary: This paper introduces the Mixture Adaptive Design (MAD), a new experimental design for multi-armed bandit (MAB) algorithms. MAD enables anytime-valid inference on the Average Treatment Effect (ATE) for emphany MAB algorithm.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Experimentation is crucial for managers to rigorously quantify the value of a change and determine if it leads to a statistically significant improvement over the status quo. As companies increasingly mandate that all changes undergo experimentation before widespread release, two challenges arise: (1) minimizing the proportion of customers assigned to the inferior treatment and (2) increasing experimentation velocity by enabling data-dependent stopping. This paper addresses both challenges by introducing the Mixture Adaptive Design (MAD), a new experimental design for multi-armed bandit (MAB) algorithms that enables anytime-valid inference on the Average Treatment Effect (ATE) for \emph{any} MAB algorithm. Intuitively, MAD "mixes" any bandit algorithm with a Bernoulli design, where at each time step, the probability of assigning a unit via the Bernoulli design is determined by a user-specified deterministic sequence that can converge to zero. This sequence lets managers directly control the trade-off between regret minimization and inferential precision. Under mild conditions on the rate the sequence converges to zero, we provide a confidence sequence that is asymptotically anytime-valid and guaranteed to shrink around the true ATE. Hence, when the true ATE converges to a non-zero value, the MAD confidence sequence is guaranteed to exclude zero in finite time. Therefore, the MAD enables managers to stop experiments early while ensuring valid inference, enhancing both the efficiency and reliability of adaptive experiments. Empirically, we demonstrate that the MAD achieves finite-sample anytime-validity while accurately and precisely estimating the ATE, all without incurring significant losses in reward compared to standard bandit designs.

Related papers

COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees [51.5976496056012]
COIN is an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question.<n>COIN estimates the empirical error rate on a calibration set and applies confidence interval methods to establish a high-probability upper bound on the true error rate.<n>We demonstrate COIN's robustness in risk control, strong test-time power in retaining admissible answers, and predictive efficiency under limited calibration data.
arXiv Detail & Related papers (2025-06-25T07:04:49Z)
Learning the Optimal Stopping for Early Classification within Finite Horizons via Sequential Probability Ratio Test [11.199585259018459]
Time-sensitive machine learning benefits from Sequential Probability Ratio Test (SPRT), which provides an optimal stopping time for early classification of time series. In finite horizon scenarios, where input lengths are finite, determining the optimal stopping rule becomes computationally intensive due to the need for backward induction. We introduce FIRMBOUND, an SPRT-based framework that efficiently estimates the solution to backward induction from training data.
arXiv Detail & Related papers (2025-01-29T23:54:46Z)
VALID: a Validated Algorithm for Learning in Decentralized Networks with Possible Adversarial Presence [13.612214163974459]
We introduce the paradigm of validated decentralized learning for undirected networks with heterogeneous data. VALID protocol is the first to achieve a validated learning guarantee. Remarkably, VALID retains optimal performance metrics in adversary-free environments.
arXiv Detail & Related papers (2024-05-12T15:55:43Z)
Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [55.17761802332469]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample. Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications. We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z)
Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling [73.5602474095954]
We study the non-asymptotic performance of approximation schemes with delayed updates under Markovian sampling. Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms.
arXiv Detail & Related papers (2024-02-19T03:08:02Z)
Equal Opportunity of Coverage in Fair Regression [50.76908018786335]
We study fair machine learning (ML) under predictive uncertainty to enable reliable and trustworthy decision-making. We propose Equal Opportunity of Coverage (EOC) that aims to achieve two properties: (1) coverage rates for different groups with similar outcomes are close, and (2) the coverage rate for the entire population remains at a predetermined level.
arXiv Detail & Related papers (2023-11-03T21:19:59Z)
Score Matching-based Pseudolikelihood Estimation of Neural Marked Spatio-Temporal Point Process with Uncertainty Quantification [59.81904428056924]
We introduce SMASH: a Score MAtching estimator for learning markedPs with uncertainty quantification. Specifically, our framework adopts a normalization-free objective by estimating the pseudolikelihood of markedPs through score-matching. The superior performance of our proposed framework is demonstrated through extensive experiments in both event prediction and uncertainty quantification.
arXiv Detail & Related papers (2023-10-25T02:37:51Z)
Distributional Shift-Aware Off-Policy Interval Estimation: A Unified Error Quantification Framework [8.572441599469597]
We study high-confidence off-policy evaluation in the context of infinite-horizon Markov decision processes. The objective is to establish a confidence interval (CI) for the target policy value using only offline data pre-collected from unknown behavior policies. We show that our algorithm is sample-efficient, error-robust, and provably convergent even in non-linear function approximation settings.
arXiv Detail & Related papers (2023-09-23T06:35:44Z)
PAPAL: A Provable PArticle-based Primal-Dual ALgorithm for Mixed Nash Equilibrium [58.26573117273626]
We consider the non-AL equilibrium nonconptotic objective function in two-player zero-sum continuous games. Our novel insights into the particle-based algorithms for continuous distribution strategies are presented.
arXiv Detail & Related papers (2023-03-02T05:08:15Z)
Near-Optimal Non-Parametric Sequential Tests and Confidence Sequences with Possibly Dependent Observations [44.71254888821376]
We provide the first type-I-error and expected-rejection-time guarantees under general non-data generating processes. We show how to apply our results to inference on parameters defined by estimating equations, such as average treatment effects.
arXiv Detail & Related papers (2022-12-29T18:37:08Z)
Double Robust Bayesian Inference on Average Treatment Effects [2.458652618559425]
We propose a double robust Bayesian inference procedure on the average treatment effect (ATE) under unconfoundedness. For our new Bayesian approach, we first adjust the prior distributions of the conditional mean functions, and then correct the posterior distribution of the resulting ATE.
arXiv Detail & Related papers (2022-11-29T15:32:25Z)
Neighborhood Spatial Aggregation MC Dropout for Efficient Uncertainty-aware Semantic Segmentation in Point Clouds [8.98036662506975]
Uncertainty-aware semantic segmentation of point clouds includes the predictive uncertainty estimation and the uncertainty-guided model optimization. The widely-used MC dropout establishes the distribution by computing the standard deviation of samples using multiple forward propagations. A framework embedded with NSA-MC dropout, a variant of MC dropout, is proposed to establish distributions in just one forward pass.
arXiv Detail & Related papers (2021-12-05T02:22:32Z)
On the Practicality of Differential Privacy in Federated Learning by Tuning Iteration Times [51.61278695776151]
Federated Learning (FL) is well known for its privacy protection when training machine learning models among distributed clients collaboratively. Recent studies have pointed out that the naive FL is susceptible to gradient leakage attacks. Differential Privacy (DP) emerges as a promising countermeasure to defend against gradient leakage attacks.
arXiv Detail & Related papers (2021-01-11T19:43:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.