Related papers: Improving the Estimation of Lifetime Effects in A/B Testing via Treatment Locality

Improving the Estimation of Lifetime Effects in A/B Testing via Treatment Locality

URL: http://arxiv.org/abs/2407.19618v3
Date: Tue, 09 Sep 2025 14:04:18 GMT
Title: Improving the Estimation of Lifetime Effects in A/B Testing via Treatment Locality
Authors: Shuze Chen, David Simchi-Levi, Chonghuan Wang,
Abstract summary: We develop optimal inference techniques for general A/B testing in Markov Decision Processes.<n>We propose methods to harness the localized structure by sharing information on the non-targeted states.<n>We show that all such estimators can benefit from variance reduction through information sharing without increasing their bias.
Score: 16.36651676133996
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Utilizing randomized experiments to evaluate the effect of short-term treatments on the short-term outcomes has been well understood and become the golden standard in industrial practice. However, as service systems become increasingly dynamical and personalized, much focus is shifting toward maximizing long-term outcomes, such as customer lifetime value, through lifetime exposure to interventions. Our goal is to assess the impact of treatment and control policies on long-term outcomes from relatively short-term observations, such as those generated by A/B testing. A key managerial observation is that many practical treatments are local, affecting only targeted states while leaving other parts of the policy unchanged. This paper rigorously investigates whether and how such locality can be exploited to improve estimation of long-term effects in Markov Decision Processes (MDPs), a fundamental model of dynamic systems. We first develop optimal inference techniques for general A/B testing in MDPs and establish corresponding efficiency bounds. We then propose methods to harness the localized structure by sharing information on the non-targeted states. Our new estimator can achieve a linear reduction with the number of test arms for a major part of the variance without sacrificing unbiasedness. It also matches a tighter variance lower bound that accounts for locality. Furthermore, we extend our framework to a broad class of differentiable estimators, which encompasses many widely used approaches in practice. We show that all such estimators can benefit from variance reduction through information sharing without increasing their bias. Together, these results provide both theoretical foundations and practical tools for conducting efficient experiments in dynamic service systems with local treatments.

Related papers

Practical Improvements of A/B Testing with Off-Policy Estimation [51.25970890274447]
We introduce a family of unbiased off-policy estimators that achieves lower variance than the standard approach.<n>Our theoretical analysis and experimental results validate the effectiveness and practicality of the proposed method.
arXiv Detail & Related papers (2025-06-12T13:11:01Z)
Post Launch Evaluation of Policies in a High-Dimensional Setting [4.710921988115686]
A/B tests, also known as randomized controlled experiments (RCTs), are the gold standard for evaluating the impact of new policies, products, or decisions.<n>This paper explores practical considerations in applying methodologies inspired by "synthetic control"<n>Synthetic control methods leverage data from unaffected units to estimate counterfactual outcomes for treated units.
arXiv Detail & Related papers (2024-12-30T19:35:29Z)
Constructing Confidence Intervals for Average Treatment Effects from Multiple Datasets [51.2467404472005]
We propose a new method that estimates the ATE from multiple observational datasets and provides valid CIs.<n>Our method makes little assumptions about the observational datasets and is thus widely applicable in medical practice.
arXiv Detail & Related papers (2024-12-16T07:39:46Z)
Estimating the treatment effect over time under general interference through deep learner integrated TMLE [7.2615408834692685]
We introduce DeepNetTMLE, a deep-learning-enhanced Targeted Maximum Likelihood Estimation (TMLE) method. DeepNetTMLE mitigates bias from time-varying confounders under general interference. We show that DeepNetTMLE achieves lower bias and more precise confidence intervals in counterfactual estimates.
arXiv Detail & Related papers (2024-12-06T06:09:43Z)
Comparing Targeting Strategies for Maximizing Social Welfare with Limited Resources [20.99198458867724]
Policymakers rarely have access to data from a randomized controlled trial (RCT) that would enable accurate estimates of which individuals would benefit more from the intervention.<n> Practitioners instead commonly use a technique termed risk-based targeting" where the model is just used to predict each individual's status quo outcome.<n>There is currently almost no empirical evidence to inform which choices lead to the most effective machine learning-informed targeting strategies.
arXiv Detail & Related papers (2024-11-11T22:36:50Z)
Reduced-Rank Multi-objective Policy Learning and Optimization [57.978477569678844]
In practice, causal researchers do not have a single outcome in mind a priori. In government-assisted social benefit programs, policymakers collect many outcomes to understand the multidimensional nature of poverty. We present a data-driven dimensionality-reduction methodology for multiple outcomes in the context of optimal policy learning.
arXiv Detail & Related papers (2024-04-29T08:16:30Z)
Longitudinal Targeted Minimum Loss-based Estimation with Temporal-Difference Heterogeneous Transformer [7.451436112917229]
We propose a novel approach to estimate the counterfactual mean of outcome under dynamic treatment policies in longitudinal problem settings. Our approach utilizes a transformer architecture with heterogeneous type embedding trained using temporal-difference learning. Our method also facilitates statistical inference by enabling the provision of 95% confidence intervals grounded in statistical theory.
arXiv Detail & Related papers (2024-04-05T20:56:15Z)
Individualized Policy Evaluation and Learning under Clustered Network Interference [3.8601741392210434]
We consider the problem of evaluating and learning an optimal individualized treatment rule (ITR) under clustered network interference.<n>We propose an estimator that can be used to evaluate the empirical performance of an ITR.<n>We derive the finite-sample regret bound for a learned ITR, showing that the use of our efficient evaluation estimator leads to the improved performance of learned policies.
arXiv Detail & Related papers (2023-11-04T17:58:24Z)
Stage-Aware Learning for Dynamic Treatments [3.6923632650826486]
We propose a novel individualized learning method for dynamic treatment regimes. By relaxing the restriction that the observed trajectory must be fully aligned with the optimal treatments, our approach substantially improves the sample efficiency and stability of IPWE-based methods.
arXiv Detail & Related papers (2023-10-30T06:35:31Z)
Choosing a Proxy Metric from Past Experiments [54.338884612982405]
In many randomized experiments, the treatment effect of the long-term metric is often difficult or infeasible to measure. A common alternative is to measure several short-term proxy metrics in the hope they closely track the long-term metric. We introduce a new statistical framework to both define and construct an optimal proxy metric for use in a homogeneous population of randomized experiments.
arXiv Detail & Related papers (2023-09-14T17:43:02Z)
Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment Effect Estimation [137.3520153445413]
A notable gap exists in the evaluation of causal discovery methods, where insufficient emphasis is placed on downstream inference. We evaluate seven established baseline causal discovery methods including a newly proposed method based on GFlowNets. The results of our study demonstrate that some of the algorithms studied are able to effectively capture a wide range of useful and diverse ATE modes.
arXiv Detail & Related papers (2023-07-11T02:58:10Z)
B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding [51.74479522965712]
We propose a meta-learner called the B-Learner, which can efficiently learn sharp bounds on the CATE function under limits on hidden confounding. We prove its estimates are valid, sharp, efficient, and have a quasi-oracle property with respect to the constituent estimators under more general conditions than existing methods.
arXiv Detail & Related papers (2023-04-20T18:07:19Z)
Improved Policy Evaluation for Randomized Trials of Algorithmic Resource Allocation [54.72195809248172]
We present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT. We prove theoretically that such an estimator is more accurate than common estimators based on sample means.
arXiv Detail & Related papers (2023-02-06T05:17:22Z)
TCFimt: Temporal Counterfactual Forecasting from Individual Multiple Treatment Perspective [50.675845725806724]
We propose a comprehensive framework of temporal counterfactual forecasting from an individual multiple treatment perspective (TCFimt) TCFimt constructs adversarial tasks in a seq2seq framework to alleviate selection and time-varying bias and designs a contrastive learning-based block to decouple a mixed treatment effect into separated main treatment effects and causal interactions. The proposed method shows satisfactory performance in predicting future outcomes with specific treatments and in choosing optimal treatment type and timing than state-of-the-art methods.
arXiv Detail & Related papers (2022-12-17T15:01:05Z)
A Reinforcement Learning Approach to Estimating Long-term Treatment Effects [13.371851720834918]
A limitation with randomized experiments is that they do not easily extend to measure long-term effects. We take a reinforcement learning (RL) approach that estimates the average reward in a Markov process. Motivated by real-world scenarios where the observed state transition is nonstationary, we develop a new algorithm for a class of nonstationary problems.
arXiv Detail & Related papers (2022-10-14T05:33:19Z)
SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data [83.50281440043241]
We study the problem of inferring heterogeneous treatment effects from time-to-event data. We propose a novel deep learning method for treatment-specific hazard estimation based on balancing representations.
arXiv Detail & Related papers (2021-10-26T20:13:17Z)
Stochastic Intervention for Causal Inference via Reinforcement Learning [7.015556609676951]
Central to causal inference is the treatment effect estimation of intervention strategies. Existing methods are mostly restricted to the deterministic treatment and compare outcomes under different treatments. We propose a new effective framework to estimate the treatment effect on intervention.
arXiv Detail & Related papers (2021-05-28T00:11:22Z)
Generalization Bounds and Representation Learning for Estimation of Potential Outcomes and Causal Effects [61.03579766573421]
We study estimation of individual-level causal effects, such as a single patient's response to alternative medication. We devise representation learning algorithms that minimize our bound, by regularizing the representation's induced treatment group distance. We extend these algorithms to simultaneously learn a weighted representation to further reduce treatment group distances.
arXiv Detail & Related papers (2020-01-21T10:16:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.