Variance Reduction in Ratio Metrics for Efficient Online Experiments
- URL: http://arxiv.org/abs/2401.04062v1
- Date: Mon, 8 Jan 2024 18:01:09 GMT
- Title: Variance Reduction in Ratio Metrics for Efficient Online Experiments
- Authors: Shubham Baweja, Neeti Pokharna, Aleksei Ustimenko and Olivier Jeunen
- Abstract summary: We apply variance reduction techniques to ratio metrics on a large-scale short-video platform: ShareChat.
Our results show that we can either improve A/B-test confidence in 77% of cases, or can retain the same level of confidence with 30% fewer data points.
- Score: 12.036747050794135
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Online controlled experiments, such as A/B-tests, are commonly used by modern
tech companies to enable continuous system improvements. Despite their
paramount importance, A/B-tests are expensive: by their very definition, a
percentage of traffic is assigned an inferior system variant. To ensure
statistical significance on top-level metrics, online experiments typically run
for several weeks. Even then, a considerable amount of experiments will lead to
inconclusive results (i.e. false negatives, or type-II error). The main culprit
for this inefficiency is the variance of the online metrics. Variance reduction
techniques have been proposed in the literature, but their direct applicability
to commonly used ratio metrics (e.g. click-through rate or user retention) is
limited.
In this work, we successfully apply variance reduction techniques to ratio
metrics on a large-scale short-video platform: ShareChat. Our empirical results
show that we can either improve A/B-test confidence in 77% of cases, or can
retain the same level of confidence with 30% fewer data points. Importantly, we
show that the common approach of including as many covariates as possible in
regression is counter-productive, highlighting that control variates based on
Gradient-Boosted Decision Tree predictors are most effective. We discuss the
practicalities of implementing these methods at scale and showcase the cost
reduction they beget.
Related papers
- STATE: A Robust ATE Estimator of Heavy-Tailed Metrics for Variance Reduction in Online Controlled Experiments [22.32661807469984]
We develop a novel framework that integrates the Student's t-distribution with machine learning tools to fit heavy-tailed metrics.
By adopting a variational EM method to optimize the loglikehood function, we can infer a robust solution that greatly eliminates the negative impact of outliers.
Both simulations on synthetic data and long-term empirical results on Meituan experiment platform demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-07-23T09:35:59Z) - Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Using Auxiliary Data to Boost Precision in the Analysis of A/B Tests on
an Online Educational Platform: New Data and New Results [1.5293427903448025]
A/B tests allow causal effect estimation without confounding bias and exact statistical inference even in small samples.
Recent methodological advances have shown that power and statistical precision can be substantially boosted by coupling design-based causal estimation to machine-learning models of rich log data from historical users who were not in the experiment.
We show that the gains can be even larger for estimating subgroup effects, hold even when the remnant is unrepresentative of the A/B test sample, and extend to post-stratification population effects estimators.
arXiv Detail & Related papers (2023-06-09T21:54:36Z) - Accounting for multiplicity in machine learning benchmark performance [0.0]
Using the highest-ranked performance as an estimate for state-of-the-art (SOTA) performance is a biased estimator, giving overly optimistic results.
In this article, we provide a probability distribution for the case of multiple classifiers so that known analyses methods can be engaged and a better SOTA estimate can be provided.
arXiv Detail & Related papers (2023-03-10T10:32:18Z) - Clustering-based Imputation for Dropout Buyers in Large-scale Online
Experimentation [4.753069295451989]
In online experimentation, appropriate metrics (e.g., purchase) provide strong evidence to support hypotheses and enhance the decision-making process.
In this work, we introduce the concept of dropout buyers and categorize users with incomplete metric values into two groups: visitors and dropout buyers.
For the analysis of incomplete metrics, we propose a clustering-based imputation method using $k$-nearest neighbors.
arXiv Detail & Related papers (2022-09-09T01:05:53Z) - Variance Reduction for Policy-Gradient Methods via Empirical Variance
Minimization [69.32510868632988]
Policy-gradient methods in Reinforcement Learning suffer from the high variance of the gradient estimate.
In this paper we for the first time investigate the performance of the one called Empirical Variance(EV)
Our experiments indicate that in terms of variance reduction EV-based methods are much better than A2C and allow stronger variance reduction.
arXiv Detail & Related papers (2022-06-14T13:18:49Z) - Scale-Equivalent Distillation for Semi-Supervised Object Detection [57.59525453301374]
Recent Semi-Supervised Object Detection (SS-OD) methods are mainly based on self-training, generating hard pseudo-labels by a teacher model on unlabeled data as supervisory signals.
We analyze the challenges these methods meet with the empirical experiment results.
We introduce a novel approach, Scale-Equivalent Distillation (SED), which is a simple yet effective end-to-end knowledge distillation framework robust to large object size variance and class imbalance.
arXiv Detail & Related papers (2022-03-23T07:33:37Z) - Newer is not always better: Rethinking transferability metrics, their
peculiarities, stability and performance [5.650647159993238]
Fine-tuning of large pre-trained image and language models on small customized datasets has become increasingly popular.
We show that the statistical problems with covariance estimation drive the poor performance of H-score.
We propose a correction and recommend measuring correlation performance against relative accuracy in such settings.
arXiv Detail & Related papers (2021-10-13T17:24:12Z) - ReMP: Rectified Metric Propagation for Few-Shot Learning [67.96021109377809]
A rectified metric space is learned to maintain the metric consistency from training to testing.
Numerous analyses indicate that a simple modification of the objective can yield substantial performance gains.
The proposed ReMP is effective and efficient, and outperforms the state of the arts on various standard few-shot learning datasets.
arXiv Detail & Related papers (2020-12-02T00:07:53Z) - Accelerated Convergence for Counterfactual Learning to Rank [65.63997193915257]
We show that convergence rate of SGD approaches with IPS-weighted gradients suffers from the large variance introduced by the IPS weights.
We propose a novel learning algorithm, called CounterSample, that has provably better convergence than standard IPS-weighted gradient descent methods.
We prove that CounterSample converges faster and complement our theoretical findings with empirical results.
arXiv Detail & Related papers (2020-05-21T12:53:36Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.