Using Auxiliary Data to Boost Precision in the Analysis of A/B Tests on
an Online Educational Platform: New Data and New Results
- URL: http://arxiv.org/abs/2306.06273v1
- Date: Fri, 9 Jun 2023 21:54:36 GMT
- Title: Using Auxiliary Data to Boost Precision in the Analysis of A/B Tests on
an Online Educational Platform: New Data and New Results
- Authors: Adam C. Sales, Ethan B. Prihar, Johann A. Gagnon-Bartsch, Neil T.
Heffernan
- Abstract summary: A/B tests allow causal effect estimation without confounding bias and exact statistical inference even in small samples.
Recent methodological advances have shown that power and statistical precision can be substantially boosted by coupling design-based causal estimation to machine-learning models of rich log data from historical users who were not in the experiment.
We show that the gains can be even larger for estimating subgroup effects, hold even when the remnant is unrepresentative of the A/B test sample, and extend to post-stratification population effects estimators.
- Score: 1.5293427903448025
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Randomized A/B tests within online learning platforms represent an exciting
direction in learning sciences. With minimal assumptions, they allow causal
effect estimation without confounding bias and exact statistical inference even
in small samples. However, often experimental samples and/or treatment effects
are small, A/B tests are underpowered, and effect estimates are overly
imprecise. Recent methodological advances have shown that power and statistical
precision can be substantially boosted by coupling design-based causal
estimation to machine-learning models of rich log data from historical users
who were not in the experiment. Estimates using these techniques remain
unbiased and inference remains exact without any additional assumptions. This
paper reviews those methods and applies them to a new dataset including over
250 randomized A/B comparisons conducted within ASSISTments, an online learning
platform. We compare results across experiments using four novel deep-learning
models of auxiliary data and show that incorporating auxiliary data into causal
estimates is roughly equivalent to increasing the sample size by 20\% on
average, or as much as 50-80\% in some cases, relative to t-tests, and by about
10\% on average, or as much as 30-50\%, compared to cutting-edge machine
learning unbiased estimates that use only data from the experiments. We show
that the gains can be even larger for estimating subgroup effects, hold even
when the remnant is unrepresentative of the A/B test sample, and extend to
post-stratification population effects estimators.
Related papers
- Uncertainty Measurement of Deep Learning System based on the Convex Hull of Training Sets [0.13265175299265505]
We propose To-hull Uncertainty and Closure Ratio, which measures an uncertainty of trained model based on the convex hull of training data.
It can observe the positional relation between the convex hull of the learned data and an unseen sample and infer how extrapolate the sample is from the convex hull.
arXiv Detail & Related papers (2024-05-25T06:25:24Z) - The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes [30.30769701138665]
We introduce and explore the Mirrored Influence Hypothesis, highlighting a reciprocal nature of influence between training and test data.
Specifically, it suggests that evaluating the influence of training data on test predictions can be reformulated as an equivalent, yet inverse problem.
We introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point.
arXiv Detail & Related papers (2024-02-14T03:43:05Z) - Variance Reduction in Ratio Metrics for Efficient Online Experiments [12.036747050794135]
We apply variance reduction techniques to ratio metrics on a large-scale short-video platform: ShareChat.
Our results show that we can either improve A/B-test confidence in 77% of cases, or can retain the same level of confidence with 30% fewer data points.
arXiv Detail & Related papers (2024-01-08T18:01:09Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Feature-Level Debiased Natural Language Understanding [86.8751772146264]
Existing natural language understanding (NLU) models often rely on dataset biases to achieve high performance on specific datasets.
We propose debiasing contrastive learning (DCT) to mitigate biased latent features and neglect the dynamic nature of bias.
DCT outperforms state-of-the-art baselines on out-of-distribution datasets while maintaining in-distribution performance.
arXiv Detail & Related papers (2022-12-11T06:16:14Z) - Agree to Disagree: Diversity through Disagreement for Better
Transferability [54.308327969778155]
We propose D-BAT (Diversity-By-disAgreement Training), which enforces agreement among the models on the training data.
We show how D-BAT naturally emerges from the notion of generalized discrepancy.
arXiv Detail & Related papers (2022-02-09T12:03:02Z) - Rethinking InfoNCE: How Many Negative Samples Do You Need? [54.146208195806636]
We study how many negative samples are optimal for InfoNCE in different scenarios via a semi-quantitative theoretical framework.
We estimate the optimal negative sampling ratio using the $K$ value that maximizes the training effectiveness function.
arXiv Detail & Related papers (2021-05-27T08:38:29Z) - Robust Fairness-aware Learning Under Sample Selection Bias [17.09665420515772]
We propose a framework for robust and fair learning under sample selection bias.
We develop two algorithms to handle sample selection bias when test data is both available and unavailable.
arXiv Detail & Related papers (2021-05-24T23:23:36Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Balance-Subsampled Stable Prediction [55.13512328954456]
We propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design.
A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift.
Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.
arXiv Detail & Related papers (2020-06-08T07:01:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.