Related papers: Targeting for long-term outcomes

Targeting for long-term outcomes

URL: http://arxiv.org/abs/2010.15835v2
Date: Sat, 9 Apr 2022 17:35:17 GMT
Title: Targeting for long-term outcomes
Authors: Jeremy Yang, Dean Eckles, Paramveer Dhillon, Sinan Aral
Abstract summary: Decision makers often want to target interventions so as to maximize an outcome that is observed only in the long-term. Here we build on the statistical surrogacy and policy learning literatures to impute the missing long-term outcomes. We apply our approach in two large-scale proactive churn management experiments at The Boston Globe.
Score: 1.7205106391379026
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Decision makers often want to target interventions so as to maximize an outcome that is observed only in the long-term. This typically requires delaying decisions until the outcome is observed or relying on simple short-term proxies for the long-term outcome. Here we build on the statistical surrogacy and policy learning literatures to impute the missing long-term outcomes and then approximate the optimal targeting policy on the imputed outcomes via a doubly-robust approach. We first show that conditions for the validity of average treatment effect estimation with imputed outcomes are also sufficient for valid policy evaluation and optimization; furthermore, these conditions can be somewhat relaxed for policy optimization. We apply our approach in two large-scale proactive churn management experiments at The Boston Globe by targeting optimal discounts to its digital subscribers with the aim of maximizing long-term revenue. Using the first experiment, we evaluate this approach empirically by comparing the policy learned using imputed outcomes with a policy learned on the ground-truth, long-term outcomes. The performance of these two policies is statistically indistinguishable, and we rule out large losses from relying on surrogates. Our approach also outperforms a policy learned on short-term proxies for the long-term outcome. In a second field experiment, we implement the optimal targeting policy with additional randomized exploration, which allows us to update the optimal policy for future subscribers. Over three years, our approach had a net-positive revenue impact in the range of $4-5 million compared to the status quo.

Related papers

Policy Design in Long-Run Welfare Dynamics [21.242427640040717]
We analyze the long-term dynamics of two prominent policy frameworks: Rawlsian policies, which prioritize those with the greatest need, and utilitarian policies, which maximize immediate welfare gains. We prove that interventions following Rawlsian policies can outperform utilitarian policies in the long run, even when the latter dominate in the short run. Our results underscore the necessity of considering long-term horizons in designing and evaluating welfare policies.
arXiv Detail & Related papers (2025-03-01T21:50:57Z)
Optimal Policy Adaptation under Covariate Shift [15.703626346971182]
We propose principled approaches for learning the optimal policy in the target domain by leveraging two datasets. We derive the identifiability assumptions for the reward induced by a given policy. We then learn the optimal policy by optimizing the estimated reward.
arXiv Detail & Related papers (2025-01-14T12:33:02Z)
Predicting Long Term Sequential Policy Value Using Softer Surrogates [45.9831721774649]
Off-policy policy evaluation estimates the outcome of a new policy using historical data collected from a different policy. We show that our estimators can provide accurate predictions about the policy value only after observing 10% of the full horizon data.
arXiv Detail & Related papers (2024-12-30T01:01:15Z)
Efficient Multi-Policy Evaluation for Reinforcement Learning [25.83084281519926]
We design a tailored behavior policy to reduce the variance of estimators across all target policies. We show our estimator has a substantially lower variance compared with previous best methods.
arXiv Detail & Related papers (2024-08-16T12:33:40Z)
Analyzing and Bridging the Gap between Maximizing Total Reward and Discounted Reward in Deep Reinforcement Learning [17.245293915129942]
In deep reinforcement learning applications, maximizing discounted reward is often employed instead of maximizing total reward. We analyzed the suboptimality of the policy obtained through maximizing discounted reward in relation to the policy that maximizes total reward. We developed methods to align the optimal policies of the two objectives in certain situations, which can improve the performance of reinforcement learning algorithms.
arXiv Detail & Related papers (2024-07-18T08:33:10Z)
Policy Learning for Balancing Short-Term and Long-Term Rewards [11.859587700058235]
This paper formalizes a new framework for learning the optimal policy, where some long-term outcomes are allowed to be missing. We show that short-term outcomes, if associated, contribute to improving the estimator of the long-term reward balances.
arXiv Detail & Related papers (2024-05-06T10:09:35Z)
Reduced-Rank Multi-objective Policy Learning and Optimization [57.978477569678844]
In practice, causal researchers do not have a single outcome in mind a priori. In government-assisted social benefit programs, policymakers collect many outcomes to understand the multidimensional nature of poverty. We present a data-driven dimensionality-reduction methodology for multiple outcomes in the context of optimal policy learning.
arXiv Detail & Related papers (2024-04-29T08:16:30Z)
Machine Learning Who to Nudge: Causal vs Predictive Targeting in a Field Experiment on Student Financial Aid Renewal [5.044100238869374]
We analyze the value of targeting in a large-scale field experiment with over 53,000 college students. We show that targeting based on low baseline outcomes is most effective in our specific application.
arXiv Detail & Related papers (2023-10-12T19:08:45Z)
Conformal Off-Policy Evaluation in Markov Decision Processes [53.786439742572995]
Reinforcement Learning aims at identifying and evaluating efficient control policies from data. Most methods for this learning task, referred to as Off-Policy Evaluation (OPE), do not come with accuracy and certainty guarantees. We present a novel OPE method based on Conformal Prediction that outputs an interval containing the true reward of the target policy with a prescribed level of certainty.
arXiv Detail & Related papers (2023-04-05T16:45:11Z)
Supervised Off-Policy Ranking [145.3039527243585]
Off-policy evaluation (OPE) leverages data generated by other policies to evaluate a target policy. We propose supervised off-policy ranking that learns a policy scoring model by correctly ranking training policies with known performance. Our method outperforms strong baseline OPE methods in terms of both rank correlation and performance gap between the truly best and the best of the ranked top three policies.
arXiv Detail & Related papers (2021-07-03T07:01:23Z)
Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria. We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z)
Batch Policy Learning in Average Reward Markov Decision Processes [3.9023554886892438]
Motivated by mobile health applications, we focus on learning a policy that maximizes the long-term average reward. We develop an optimization algorithm to compute the optimal policy in a parameterized policy class. The performance of the estimated policy is measured by the difference between the optimal average reward in the policy class and the average reward of the estimated policy.
arXiv Detail & Related papers (2020-07-23T03:28:14Z)
Provably Good Batch Reinforcement Learning Without Great Exploration [51.51462608429621]
Batch reinforcement learning (RL) is important to apply RL algorithms to many high stakes tasks. Recent algorithms have shown promise but can still be overly optimistic in their expected outcomes. We show that a small modification to Bellman optimality and evaluation back-up to take a more conservative update can have much stronger guarantees.
arXiv Detail & Related papers (2020-07-16T09:25:54Z)
Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies [80.42316902296832]
We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous. In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist. We propose several new doubly robust estimators based on different kernelization approaches.
arXiv Detail & Related papers (2020-06-06T15:52:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.