Sample Complexity of Nonparametric Off-Policy Evaluation on
Low-Dimensional Manifolds using Deep Networks
- URL: http://arxiv.org/abs/2206.02887v1
- Date: Mon, 6 Jun 2022 20:25:20 GMT
- Title: Sample Complexity of Nonparametric Off-Policy Evaluation on
Low-Dimensional Manifolds using Deep Networks
- Authors: Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao
- Abstract summary: We consider the off-policy evaluation problem of reinforcement learning using deep neural networks.
We show that, by choosing network size appropriately, one can leverage the low-dimensional manifold structure in the Markov decision process.
- Score: 71.95722100511627
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the off-policy evaluation problem of reinforcement learning using
deep neural networks. We analyze the deep fitted Q-evaluation method for
estimating the expected cumulative reward of a target policy, when the data are
generated from an unknown behavior policy. We show that, by choosing network
size appropriately, one can leverage the low-dimensional manifold structure in
the Markov decision process and obtain a sample-efficient estimator without
suffering from the curse of high representation dimensionality. Specifically,
we establish a sharp error bound for the fitted Q-evaluation that depends on
the intrinsic low dimension, the smoothness of the state-action space, and a
function class-restricted $\chi^2$-divergence. It is noteworthy that the
restricted $\chi^2$-divergence measures the behavior and target policies' {\it
mismatch in the function space}, which can be small even if the two policies
are not close to each other in their tabular forms. Numerical experiments are
provided to support our theoretical analysis.
Related papers
- Sample Complexity of Neural Policy Mirror Descent for Policy
Optimization on Low-Dimensional Manifolds [75.51968172401394]
We study the sample complexity of the neural policy mirror descent (NPMD) algorithm with deep convolutional neural networks (CNN)
In each iteration of NPMD, both the value function and the policy can be well approximated by CNNs.
We show that NPMD can leverage the low-dimensional structure of state space to escape from the curse of dimensionality.
arXiv Detail & Related papers (2023-09-25T07:31:22Z) - Quantile Off-Policy Evaluation via Deep Conditional Generative Learning [21.448553360543478]
Off-Policy evaluation (OPE) is concerned with evaluating a new target policy using offline data generated by a potentially different behavior policy.
We propose a doubly-robust inference procedure for quantile OPE in sequential decision making.
We demonstrate the advantages of this proposed estimator through both simulations and a real-world dataset from a short-video platform.
arXiv Detail & Related papers (2022-12-29T22:01:43Z) - Nearly Optimal Latent State Decoding in Block MDPs [74.51224067640717]
In episodic Block MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states.
We are first interested in estimating the latent state decoding function based on data generated under a fixed behavior policy.
We then study the problem of learning near-optimal policies in the reward-free framework.
arXiv Detail & Related papers (2022-08-17T18:49:53Z) - Sparse Feature Selection Makes Batch Reinforcement Learning More Sample
Efficient [62.24615324523435]
This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation.
When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient.
arXiv Detail & Related papers (2020-11-08T16:48:02Z) - Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation.
Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z) - Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement
Learning [70.01650994156797]
Off- evaluation of sequential decision policies from observational data is necessary in batch reinforcement learning such as education healthcare.
We develop an approach that estimates the bounds of a given policy.
We prove convergence to the sharp bounds as we collect more confounded data.
arXiv Detail & Related papers (2020-02-11T16:18:14Z) - On Last-Layer Algorithms for Classification: Decoupling Representation
from Uncertainty Estimation [27.077741143188867]
We propose a family of algorithms which split the classification task into two stages: representation learning and uncertainty estimation.
We evaluate their performance in terms of emphselective classification (risk-coverage), and their ability to detect out-of-distribution samples.
arXiv Detail & Related papers (2020-01-22T15:08:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.