Experimenting on Markov Decision Processes with Local Treatments
- URL: http://arxiv.org/abs/2407.19618v2
- Date: Fri, 18 Oct 2024 03:19:30 GMT
- Title: Experimenting on Markov Decision Processes with Local Treatments
- Authors: Shuze Chen, David Simchi-Levi, Chonghuan Wang,
- Abstract summary: We investigate the randomized experiments within dynamical systems modeled as Markov Decision Processes (MDPs)
Our goal is to assess the impact of treatment and control policies on long-term cumulative rewards from relatively short-term observations.
- Score: 13.182388658918502
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Utilizing randomized experiments to evaluate the effect of short-term treatments on the short-term outcomes has been well understood and become the golden standard in industrial practice. However, as service systems become increasingly dynamical and personalized, much focus is shifting toward maximizing long-term cumulative outcomes, such as customer lifetime value, through lifetime exposure to interventions. To bridge this gap, we investigate the randomized experiments within dynamical systems modeled as Markov Decision Processes (MDPs). Our goal is to assess the impact of treatment and control policies on long-term cumulative rewards from relatively short-term observations. We first develop optimal inference techniques for assessing the effects of general treatment patterns. Furthermore, recognizing that many real-world treatments tend to be fine-grained and localized for practical efficiency and operational convenience, we then propose methods to harness this localized structure by sharing information on the non-targeted states. Our new estimator effectively overcomes the variance lower bound for general treatments while matching the more stringent lower bound incorporating the local treatment structure. Furthermore, our estimator can optimally achieve a linear reduction with the number of test arms for a major part of the variance. Finally, we explore scenarios with perfect knowledge of the control arm and design estimators that further improve inference efficiency.
Related papers
- Reduced-Rank Multi-objective Policy Learning and Optimization [57.978477569678844]
In practice, causal researchers do not have a single outcome in mind a priori.
In government-assisted social benefit programs, policymakers collect many outcomes to understand the multidimensional nature of poverty.
We present a data-driven dimensionality-reduction methodology for multiple outcomes in the context of optimal policy learning.
arXiv Detail & Related papers (2024-04-29T08:16:30Z) - Longitudinal Targeted Minimum Loss-based Estimation with Temporal-Difference Heterogeneous Transformer [7.451436112917229]
We propose a novel approach to estimate the counterfactual mean of outcome under dynamic treatment policies in longitudinal problem settings.
Our approach utilizes a transformer architecture with heterogeneous type embedding trained using temporal-difference learning.
Our method also facilitates statistical inference by enabling the provision of 95% confidence intervals grounded in statistical theory.
arXiv Detail & Related papers (2024-04-05T20:56:15Z) - Stage-Aware Learning for Dynamic Treatments [3.6923632650826486]
We propose a novel individualized learning method for dynamic treatment regimes.
By relaxing the restriction that the observed trajectory must be fully aligned with the optimal treatments, our approach substantially improves the sample efficiency and stability of IPWE-based methods.
arXiv Detail & Related papers (2023-10-30T06:35:31Z) - Choosing a Proxy Metric from Past Experiments [54.338884612982405]
In many randomized experiments, the treatment effect of the long-term metric is often difficult or infeasible to measure.
A common alternative is to measure several short-term proxy metrics in the hope they closely track the long-term metric.
We introduce a new statistical framework to both define and construct an optimal proxy metric for use in a homogeneous population of randomized experiments.
arXiv Detail & Related papers (2023-09-14T17:43:02Z) - Improved Policy Evaluation for Randomized Trials of Algorithmic Resource
Allocation [54.72195809248172]
We present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT.
We prove theoretically that such an estimator is more accurate than common estimators based on sample means.
arXiv Detail & Related papers (2023-02-06T05:17:22Z) - TCFimt: Temporal Counterfactual Forecasting from Individual Multiple
Treatment Perspective [50.675845725806724]
We propose a comprehensive framework of temporal counterfactual forecasting from an individual multiple treatment perspective (TCFimt)
TCFimt constructs adversarial tasks in a seq2seq framework to alleviate selection and time-varying bias and designs a contrastive learning-based block to decouple a mixed treatment effect into separated main treatment effects and causal interactions.
The proposed method shows satisfactory performance in predicting future outcomes with specific treatments and in choosing optimal treatment type and timing than state-of-the-art methods.
arXiv Detail & Related papers (2022-12-17T15:01:05Z) - A Reinforcement Learning Approach to Estimating Long-term Treatment
Effects [13.371851720834918]
A limitation with randomized experiments is that they do not easily extend to measure long-term effects.
We take a reinforcement learning (RL) approach that estimates the average reward in a Markov process.
Motivated by real-world scenarios where the observed state transition is nonstationary, we develop a new algorithm for a class of nonstationary problems.
arXiv Detail & Related papers (2022-10-14T05:33:19Z) - SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event
Data [83.50281440043241]
We study the problem of inferring heterogeneous treatment effects from time-to-event data.
We propose a novel deep learning method for treatment-specific hazard estimation based on balancing representations.
arXiv Detail & Related papers (2021-10-26T20:13:17Z) - Stochastic Intervention for Causal Inference via Reinforcement Learning [7.015556609676951]
Central to causal inference is the treatment effect estimation of intervention strategies.
Existing methods are mostly restricted to the deterministic treatment and compare outcomes under different treatments.
We propose a new effective framework to estimate the treatment effect on intervention.
arXiv Detail & Related papers (2021-05-28T00:11:22Z) - Generalization Bounds and Representation Learning for Estimation of
Potential Outcomes and Causal Effects [61.03579766573421]
We study estimation of individual-level causal effects, such as a single patient's response to alternative medication.
We devise representation learning algorithms that minimize our bound, by regularizing the representation's induced treatment group distance.
We extend these algorithms to simultaneously learn a weighted representation to further reduce treatment group distances.
arXiv Detail & Related papers (2020-01-21T10:16:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.