Policy Learning for Balancing Short-Term and Long-Term Rewards
- URL: http://arxiv.org/abs/2405.03329v2
- Date: Mon, 16 Sep 2024 00:19:16 GMT
- Title: Policy Learning for Balancing Short-Term and Long-Term Rewards
- Authors: Peng Wu, Ziyu Shen, Feng Xie, Zhongyao Wang, Chunchen Liu, Yan Zeng,
- Abstract summary: This paper formalizes a new framework for learning the optimal policy, where some long-term outcomes are allowed to be missing.
We show that short-term outcomes, if associated, contribute to improving the estimator of the long-term reward balances.
- Score: 11.859587700058235
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Empirical researchers and decision-makers spanning various domains frequently seek profound insights into the long-term impacts of interventions. While the significance of long-term outcomes is undeniable, an overemphasis on them may inadvertently overshadow short-term gains. Motivated by this, this paper formalizes a new framework for learning the optimal policy that effectively balances both long-term and short-term rewards, where some long-term outcomes are allowed to be missing. In particular, we first present the identifiability of both rewards under mild assumptions. Next, we deduce the semiparametric efficiency bounds, along with the consistency and asymptotic normality of their estimators. We also reveal that short-term outcomes, if associated, contribute to improving the estimator of the long-term reward. Based on the proposed estimators, we develop a principled policy learning approach and further derive the convergence rates of regret and estimation errors associated with the learned policy. Extensive experiments are conducted to validate the effectiveness of the proposed method, demonstrating its practical applicability.
Related papers
- Reduced-Rank Multi-objective Policy Learning and Optimization [57.978477569678844]
In practice, causal researchers do not have a single outcome in mind a priori.
In government-assisted social benefit programs, policymakers collect many outcomes to understand the multidimensional nature of poverty.
We present a data-driven dimensionality-reduction methodology for multiple outcomes in the context of optimal policy learning.
arXiv Detail & Related papers (2024-04-29T08:16:30Z) - Long-term Off-Policy Evaluation and Learning [21.047613223586794]
Short- and long-term outcomes of an algorithm often differ, with damaging downstream effects.
It takes months or even longer to observe the long-term outcomes of interest, making the algorithm selection process unacceptably slow.
We propose a new framework called Long-term Off-Policy Evaluation (LOPE), which is based on reward function decomposition.
arXiv Detail & Related papers (2024-04-24T06:59:59Z) - Pareto-Optimal Estimation and Policy Learning on Short-term and
Long-term Treatment Effects [36.46155152979874]
How to trade-off between short-term or long-term effects or the both to achieve optimal treatment remains an open challenge.
In this paper, we systematically investigate these issues and introduce a Pareto-Efficient algorithm, comprising POE and POPL.
Results on both the synthetic and real-world datasets demonstrate the superiority of our method.
arXiv Detail & Related papers (2024-03-05T03:32:02Z) - Loss Shaping Constraints for Long-Term Time Series Forecasting [79.3533114027664]
We present a Constrained Learning approach for long-term time series forecasting that respects a user-defined upper bound on the loss at each time-step.
We propose a practical Primal-Dual algorithm to tackle it, and aims to demonstrate that it exhibits competitive average performance in time series benchmarks, while shaping the errors across the predicted window.
arXiv Detail & Related papers (2024-02-14T18:20:44Z) - Adapting Static Fairness to Sequential Decision-Making: Bias Mitigation Strategies towards Equal Long-term Benefit Rate [41.51680686036846]
We introduce a long-term fairness concept named Equal Long-term Benefit Rate (ELBERT) to address biases in sequential decision-making.
ELBERT effectively addresses the temporal discrimination issues found in previous long-term fairness notions.
We show that ELBERT-PO significantly diminishes bias while maintaining high utility.
arXiv Detail & Related papers (2023-09-07T01:10:01Z) - Improved Policy Evaluation for Randomized Trials of Algorithmic Resource
Allocation [54.72195809248172]
We present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT.
We prove theoretically that such an estimator is more accurate than common estimators based on sample means.
arXiv Detail & Related papers (2023-02-06T05:17:22Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - Targeting for long-term outcomes [1.7205106391379026]
Decision makers often want to target interventions so as to maximize an outcome that is observed only in the long-term.
Here we build on the statistical surrogacy and policy learning literatures to impute the missing long-term outcomes.
We apply our approach in two large-scale proactive churn management experiments at The Boston Globe.
arXiv Detail & Related papers (2020-10-29T18:31:17Z) - Long-Term Effect Estimation with Surrogate Representation [43.932546958874696]
This work studies the problem of long-term effect where the outcome of primary interest, or primary outcome, takes months or even years to accumulate.
We propose to build connections between long-term causal inference and sequential models in machine learning.
arXiv Detail & Related papers (2020-08-19T03:16:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.