Post Launch Evaluation of Policies in a High-Dimensional Setting
- URL: http://arxiv.org/abs/2501.00119v1
- Date: Mon, 30 Dec 2024 19:35:29 GMT
- Title: Post Launch Evaluation of Policies in a High-Dimensional Setting
- Authors: Shima Nassiri, Mohsen Bayati, Joe Cooprider,
- Abstract summary: A/B tests, also known as randomized controlled experiments (RCTs), are the gold standard for evaluating the impact of new policies, products, or decisions.
This paper explores practical considerations in applying methodologies inspired by "synthetic control"
Synthetic control methods leverage data from unaffected units to estimate counterfactual outcomes for treated units.
- Score: 4.710921988115686
- License:
- Abstract: A/B tests, also known as randomized controlled experiments (RCTs), are the gold standard for evaluating the impact of new policies, products, or decisions. However, these tests can be costly in terms of time and resources, potentially exposing users, customers, or other test subjects (units) to inferior options. This paper explores practical considerations in applying methodologies inspired by "synthetic control" as an alternative to traditional A/B testing in settings with very large numbers of units, involving up to hundreds of millions of units, which is common in modern applications such as e-commerce and ride-sharing platforms. This method is particularly valuable in settings where the treatment affects only a subset of units, leaving many units unaffected. In these scenarios, synthetic control methods leverage data from unaffected units to estimate counterfactual outcomes for treated units. After the treatment is implemented, these estimates can be compared to actual outcomes to measure the treatment effect. A key challenge in creating accurate counterfactual outcomes is interpolation bias, a well-documented phenomenon that occurs when control units differ significantly from treated units. To address this, we propose a two-phase approach: first using nearest neighbor matching based on unit covariates to select similar control units, then applying supervised learning methods suitable for high-dimensional data to estimate counterfactual outcomes. Testing using six large-scale experiments demonstrates that this approach successfully improves estimate accuracy. However, our analysis reveals that machine learning bias -- which arises from methods that trade off bias for variance reduction -- can impact results and affect conclusions about treatment effects. We document this bias in large-scale experimental settings and propose effective de-biasing techniques to address this challenge.
Related papers
- A Simple Model to Estimate Sharing Effects in Social Networks [3.988614978933934]
We propose a simple Markov Decision Process (MDP)-based model describing user sharing behaviour in social networks.
We derive an unbiased estimator for treatment effects under this model, and demonstrate through reproducible synthetic experiments that it outperforms existing methods by a significant margin.
arXiv Detail & Related papers (2024-09-16T13:32:36Z) - Adaptive Experimentation When You Can't Experiment [55.86593195947978]
This paper introduces the emphconfounded pure exploration transductive linear bandit (textttCPET-LB) problem.
Online services can employ a properly randomized encouragement that incentivizes users toward a specific treatment.
arXiv Detail & Related papers (2024-06-15T20:54:48Z) - Variance Reduction in Ratio Metrics for Efficient Online Experiments [12.036747050794135]
We apply variance reduction techniques to ratio metrics on a large-scale short-video platform: ShareChat.
Our results show that we can either improve A/B-test confidence in 77% of cases, or can retain the same level of confidence with 30% fewer data points.
arXiv Detail & Related papers (2024-01-08T18:01:09Z) - Causal Message Passing for Experiments with Unknown and General Network Interference [5.294604210205507]
We introduce a new framework to accommodate complex and unknown network interference.
Our framework, termed causal message-passing, is grounded in high-dimensional approximate message passing methodology.
We demonstrate the effectiveness of this approach across five numerical scenarios.
arXiv Detail & Related papers (2023-11-14T17:31:50Z) - Individualized Policy Evaluation and Learning under Clustered Network
Interference [4.560284382063488]
We consider the problem of evaluating and learning an optimal individualized treatment rule under clustered network interference.
We propose an estimator that can be used to evaluate the empirical performance of an ITR.
We derive the finite-sample regret bound for a learned ITR, showing that the use of our efficient evaluation estimator leads to the improved performance of learned policies.
arXiv Detail & Related papers (2023-11-04T17:58:24Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - Assessment of Treatment Effect Estimators for Heavy-Tailed Data [70.72363097550483]
A central obstacle in the objective assessment of treatment effect (TE) estimators in randomized control trials (RCTs) is the lack of ground truth (or validation set) to test their performance.
We provide a novel cross-validation-like methodology to address this challenge.
We evaluate our methodology across 709 RCTs implemented in the Amazon supply chain.
arXiv Detail & Related papers (2021-12-14T17:53:01Z) - Almost-Matching-Exactly for Treatment Effect Estimation under Network
Interference [73.23326654892963]
We propose a matching method that recovers direct treatment effects from randomized experiments where units are connected in an observed network.
Our method matches units almost exactly on counts of unique subgraphs within their neighborhood graphs.
arXiv Detail & Related papers (2020-03-02T15:21:20Z) - Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement
Learning Framework [68.96770035057716]
A/B testing is a business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries.
This paper introduces a reinforcement learning framework for carrying A/B testing in online experiments.
arXiv Detail & Related papers (2020-02-05T10:25:02Z) - Generalization Bounds and Representation Learning for Estimation of
Potential Outcomes and Causal Effects [61.03579766573421]
We study estimation of individual-level causal effects, such as a single patient's response to alternative medication.
We devise representation learning algorithms that minimize our bound, by regularizing the representation's induced treatment group distance.
We extend these algorithms to simultaneously learn a weighted representation to further reduce treatment group distances.
arXiv Detail & Related papers (2020-01-21T10:16:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.