Positivity-free Policy Learning with Observational Data
- URL: http://arxiv.org/abs/2310.06969v1
- Date: Tue, 10 Oct 2023 19:47:27 GMT
- Title: Positivity-free Policy Learning with Observational Data
- Authors: Pan Zhao, Antoine Chambaz, Julie Josse, Shu Yang
- Abstract summary: This study introduces a novel positivity-free (stochastic) policy learning framework.
We propose incremental propensity score policies to adjust propensity score values instead of assigning fixed values to treatments.
This paper provides a thorough exploration of the theoretical guarantees associated with policy learning and validates the proposed framework's finite-sample performance.
- Score: 8.293758599118618
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Policy learning utilizing observational data is pivotal across various
domains, with the objective of learning the optimal treatment assignment policy
while adhering to specific constraints such as fairness, budget, and
simplicity. This study introduces a novel positivity-free (stochastic) policy
learning framework designed to address the challenges posed by the
impracticality of the positivity assumption in real-world scenarios. This
framework leverages incremental propensity score policies to adjust propensity
score values instead of assigning fixed values to treatments. We characterize
these incremental propensity score policies and establish identification
conditions, employing semiparametric efficiency theory to propose efficient
estimators capable of achieving rapid convergence rates, even when integrated
with advanced machine learning algorithms. This paper provides a thorough
exploration of the theoretical guarantees associated with policy learning and
validates the proposed framework's finite-sample performance through
comprehensive numerical experiments, ensuring the identification of causal
effects from observational data is both robust and reliable.
Related papers
- Towards Theoretical Understanding of Data-Driven Policy Refinement [0.0]
This paper presents an approach for data-driven policy refinement in reinforcement learning, specifically designed for safety-critical applications.
Our principal contribution lies in the mathematical formulation of this data-driven policy refinement concept.
We present a series of theorems elucidating key theoretical properties of our approach, including convergence, robustness bounds, generalization error, and resilience to model mismatch.
arXiv Detail & Related papers (2023-05-11T13:36:21Z) - Uncertainty-Aware Instance Reweighting for Off-Policy Learning [63.31923483172859]
We propose a Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning.
Experiment results on synthetic and three real-world recommendation datasets demonstrate the advantageous sample efficiency of the proposed UIPS estimator.
arXiv Detail & Related papers (2023-03-11T11:42:26Z) - Improved Policy Evaluation for Randomized Trials of Algorithmic Resource
Allocation [54.72195809248172]
We present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT.
We prove theoretically that such an estimator is more accurate than common estimators based on sample means.
arXiv Detail & Related papers (2023-02-06T05:17:22Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - Doubly Robust Interval Estimation for Optimal Policy Evaluation in Online Learning [8.736154600219685]
Policy evaluation in online learning attracts increasing attention.
Yet, such a problem is particularly challenging due to the dependent data generated in the online environment.
We develop the doubly robust interval estimation (DREAM) method to infer the value under the estimated optimal policy in online learning.
arXiv Detail & Related papers (2021-10-29T02:38:54Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - CoinDICE: Off-Policy Confidence Interval Estimation [107.86876722777535]
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning.
We show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.
arXiv Detail & Related papers (2020-10-22T12:39:11Z) - Distributionally Robust Batch Contextual Bandits [20.667213458836734]
Policy learning using historical observational data is an important problem that has found widespread applications.
Existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment.
In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data.
arXiv Detail & Related papers (2020-06-10T03:11:40Z) - Efficient Policy Learning from Surrogate-Loss Classification Reductions [65.91730154730905]
We consider the estimation problem given by a weighted surrogate-loss classification reduction of policy learning.
We show that, under a correct specification assumption, the weighted classification formulation need not be efficient for policy parameters.
We propose an estimation approach based on generalized method of moments, which is efficient for the policy parameters.
arXiv Detail & Related papers (2020-02-12T18:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.