Ranking by Lifts: A Cost-Benefit Approach to Large-Scale A/B Tests
- URL: http://arxiv.org/abs/2407.01036v2
- Date: Mon, 02 Dec 2024 15:31:12 GMT
- Title: Ranking by Lifts: A Cost-Benefit Approach to Large-Scale A/B Tests
- Authors: Pallavi Basu, Ron Berman,
- Abstract summary: This work develops a decision-theoretic framework for maximizing profits subject to false discovery rate (FDR) control.
We build an empirical Bayes solution for the problem via a greedy knapsack approach.
Our oracle decision rule is valid and optimal for large-scale tests.
- Score: 0.0
- License:
- Abstract: A/B testers that conduct large-scale tests often prioritize lifts as the main outcome metric and want to be able to control costs resulting from false rejections of the null. This work develops a decision-theoretic framework for maximizing profits subject to false discovery rate (FDR) control. We build an empirical Bayes solution for the problem via a greedy knapsack approach. We derive an oracle rule based on ranking the ratio of expected lifts and the cost of wrong rejections using the local false discovery rate (lfdr) statistic. Our oracle decision rule is valid and optimal for large-scale tests. Further, we establish asymptotic validity for the data-driven procedure and demonstrate finite-sample validity in experimental studies. We also demonstrate the merit of the proposed method over other FDR control methods. Finally, we discuss an application to data collected by experiments on the Optimizely platform.
Related papers
- $t$-Testing the Waters: Empirically Validating Assumptions for Reliable A/B-Testing [3.988614978933934]
A/B-tests are a cornerstone of experimental design on the web, with wide-ranging applications and use-cases.
We propose a practical method to test whether the $t$-test's assumptions are met, and the A/B-test is valid.
This provides an efficient and effective way to empirically assess whether the $t$-test's assumptions are met, and the A/B-test is valid.
arXiv Detail & Related papers (2025-02-07T09:55:24Z) - Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Efficient Weighting Schemes for Auditing Instant-Runoff Voting Elections [57.67176250198289]
AWAIRE involves adaptively weighted averages of test statistics, essentially "learning" an effective set of hypotheses to test.
We explore schemes and settings more extensively, to identify and recommend efficient choices for practice.
A limitation of the current AWAIRE implementation is its restriction to a small number of candidates.
arXiv Detail & Related papers (2024-02-18T10:13:01Z) - A Semi-Bayesian Nonparametric Estimator of the Maximum Mean Discrepancy
Measure: Applications in Goodness-of-Fit Testing and Generative Adversarial
Networks [3.623570119514559]
We propose a semi-Bayesian nonparametric (semi-BNP) procedure for the goodness-of-fit (GOF) test.
Our method introduces a novel Bayesian estimator for the maximum mean discrepancy (MMD) measure.
We demonstrate that our proposed test outperforms frequentist MMD-based methods by achieving a lower false rejection and acceptance rate of the null hypothesis.
arXiv Detail & Related papers (2023-03-05T10:36:21Z) - Improved Policy Evaluation for Randomized Trials of Algorithmic Resource
Allocation [54.72195809248172]
We present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT.
We prove theoretically that such an estimator is more accurate than common estimators based on sample means.
arXiv Detail & Related papers (2023-02-06T05:17:22Z) - Optimal Treatment Regimes for Proximal Causal Learning [7.672587258250301]
We propose a novel optimal individualized treatment regime based on outcome and treatment confounding bridges.
We show that the value function of this new optimal treatment regime is superior to that of existing ones in the literature.
arXiv Detail & Related papers (2022-12-19T14:29:25Z) - Empirical Bayesian Approaches for Robust Constraint-based Causal
Discovery under Insufficient Data [38.883810061897094]
Causal discovery methods assume data sufficiency, which may not be the case in many real world datasets.
We propose Bayesian-augmented frequentist independence tests to improve the performance of constraint-based causal discovery methods under insufficient data.
Experiments show significant performance improvement in terms of both accuracy and efficiency over SOTA methods.
arXiv Detail & Related papers (2022-06-16T21:08:49Z) - Error-based Knockoffs Inference for Controlled Feature Selection [49.99321384855201]
We propose an error-based knockoff inference method by integrating the knockoff features, the error-based feature importance statistics, and the stepdown procedure together.
The proposed inference procedure does not require specifying a regression model and can handle feature selection with theoretical guarantees.
arXiv Detail & Related papers (2022-03-09T01:55:59Z) - Assessment of Treatment Effect Estimators for Heavy-Tailed Data [70.72363097550483]
A central obstacle in the objective assessment of treatment effect (TE) estimators in randomized control trials (RCTs) is the lack of ground truth (or validation set) to test their performance.
We provide a novel cross-validation-like methodology to address this challenge.
We evaluate our methodology across 709 RCTs implemented in the Amazon supply chain.
arXiv Detail & Related papers (2021-12-14T17:53:01Z) - Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling.
We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z) - Understanding and Mitigating the Limitations of Prioritized Experience
Replay [46.663239542920984]
Prioritized Replay Experience (ER) has been empirically shown to improve sample efficiency across many domains.
We show equivalence between the error-based prioritized sampling method for mean squared error and uniform sampling for cubic power loss.
We then provide theoretical insight into why it improves convergence rate upon uniform sampling during early learning.
arXiv Detail & Related papers (2020-07-19T03:10:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.