Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement
Learning
- URL: http://arxiv.org/abs/2002.04518v2
- Date: Mon, 13 Jul 2020 04:03:17 GMT
- Title: Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement
Learning
- Authors: Nathan Kallus and Angela Zhou
- Abstract summary: Off- evaluation of sequential decision policies from observational data is necessary in batch reinforcement learning such as education healthcare.
We develop an approach that estimates the bounds of a given policy.
We prove convergence to the sharp bounds as we collect more confounded data.
- Score: 70.01650994156797
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Off-policy evaluation of sequential decision policies from observational data
is necessary in applications of batch reinforcement learning such as education
and healthcare. In such settings, however, unobserved variables confound
observed actions, rendering exact evaluation of new policies impossible, i.e.,
unidentifiable. We develop a robust approach that estimates sharp bounds on the
(unidentifiable) value of a given policy in an infinite-horizon problem given
data from another policy with unobserved confounding, subject to a sensitivity
model. We consider stationary or baseline unobserved confounding and compute
bounds by optimizing over the set of all stationary state-occupancy ratios that
agree with a new partially identified estimating equation and the sensitivity
model. We prove convergence to the sharp bounds as we collect more confounded
data. Although checking set membership is a linear program, the support
function is given by a difficult nonconvex optimization problem. We develop
approximations based on nonconvex projected gradient descent and demonstrate
the resulting bounds empirically.
Related papers
- High-probability sample complexities for policy evaluation with linear function approximation [88.87036653258977]
We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms.
We establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level.
arXiv Detail & Related papers (2023-05-30T12:58:39Z) - Matrix Estimation for Offline Reinforcement Learning with Low-Rank
Structure [10.968373699696455]
We consider offline Reinforcement Learning (RL), where the agent does not interact with the environment and must rely on offline data collected using a behavior policy.
Previous works provide policy evaluation guarantees when the target policy to be evaluated is covered by the behavior policy.
We propose an offline policy evaluation algorithm that leverages the low-rank structure to estimate the values of uncovered state-action pairs.
arXiv Detail & Related papers (2023-05-24T23:49:06Z) - Offline Policy Evaluation and Optimization under Confounding [35.778917456294046]
We map out the landscape of offline policy evaluation for confounded MDPs.
We characterize settings where consistent value estimates are provably not achievable.
We present new algorithms for offline policy improvement and prove local convergence guarantees.
arXiv Detail & Related papers (2022-11-29T20:45:08Z) - A Sharp Characterization of Linear Estimators for Offline Policy
Evaluation [33.37672297925897]
offline policy evaluation is a fundamental statistical problem in reinforcement learning.
We identify simple control-theoretic and linear-algebraic conditions that are necessary and sufficient for classical methods.
Our results provide a complete picture of the behavior of linear estimators for offline policy evaluation.
arXiv Detail & Related papers (2022-03-08T17:52:57Z) - Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and
Dual Bounds [21.520045697447372]
Off-policy evaluation (OPE) is the task of estimating the expected reward of a given policy based on offline data previously collected under different policies.
This work considers the problem of constructing non-asymptotic confidence intervals in infinite-horizon off-policy evaluation.
We develop a practical algorithm through a primal-dual optimization-based approach.
arXiv Detail & Related papers (2021-03-09T22:31:20Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic
Policies [80.42316902296832]
We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous.
In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist.
We propose several new doubly robust estimators based on different kernelization approaches.
arXiv Detail & Related papers (2020-06-06T15:52:05Z) - Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation.
Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z) - Statistically Efficient Off-Policy Policy Gradients [80.42316902296832]
We consider the statistically efficient estimation of policy gradients from off-policy data.
We propose a meta-algorithm that achieves the lower bound without any parametric assumptions.
We establish guarantees on the rate at which we approach a stationary point when we take steps in the direction of our new estimated policy gradient.
arXiv Detail & Related papers (2020-02-10T18:41:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.