On Finite-Sample Analysis of Offline Reinforcement Learning with Deep
ReLU Networks
- URL: http://arxiv.org/abs/2103.06671v1
- Date: Thu, 11 Mar 2021 14:01:14 GMT
- Title: On Finite-Sample Analysis of Offline Reinforcement Learning with Deep
ReLU Networks
- Authors: Thanh Nguyen-Tang, Sunil Gupta, Hung Tran-The, Svetha Venkatesh
- Abstract summary: We study the statistical theory of offline reinforcement learning with deep ReLU networks.
We quantify how the distribution shift of the offline data, the dimension of the input space, and the regularity of the system control the OPE estimation error.
- Score: 46.067702683141356
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper studies the statistical theory of offline reinforcement learning
with deep ReLU networks. We consider the off-policy evaluation (OPE) problem
where the goal is to estimate the expected discounted reward of a target policy
given the logged data generated by unknown behaviour policies. We study a
regression-based fitted Q evaluation (FQE) method using deep ReLU networks and
characterize a finite-sample bound on the estimation error of this method under
mild assumptions. The prior works in OPE with either general function
approximation or deep ReLU networks ignore the data-dependent structure in the
algorithm, dodging the technical bottleneck of OPE, while requiring a rather
restricted regularity assumption. In this work, we overcome these limitations
and provide a comprehensive analysis of OPE with deep ReLU networks. In
particular, we precisely quantify how the distribution shift of the offline
data, the dimension of the input space, and the regularity of the system
control the OPE estimation error. Consequently, we provide insights into the
interplay between offline reinforcement learning and deep learning.
Related papers
- Neural Network Approximation for Pessimistic Offline Reinforcement
Learning [17.756108291816908]
We present a non-asymptotic estimation error of pessimistic offline RL using general neural network approximation.
Our result shows that the estimation error consists of two parts: the first converges to zero at a desired rate on the sample size with partially controllable concentrability, and the second becomes negligible if the residual constraint is tight.
arXiv Detail & Related papers (2023-12-19T05:17:27Z) - Uncertainty Estimation by Fisher Information-based Evidential Deep
Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications.
We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL)
In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z) - UAV Path Planning Employing MPC- Reinforcement Learning Method for
search and rescue mission [0.0]
We tackle the problem of Unmanned Aerial (UA V) path planning in complex and uncertain environments.
We design a Model Predictive Control (MPC) based on a Long-Short-Term Memory (LSTM) network integrated into the Deep Deterministic Policy Gradient algorithm.
arXiv Detail & Related papers (2023-02-21T13:39:40Z) - Offline Reinforcement Learning with Instrumental Variables in Confounded
Markov Decision Processes [93.61202366677526]
We study the offline reinforcement learning (RL) in the face of unmeasured confounders.
We propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy.
arXiv Detail & Related papers (2022-09-18T22:03:55Z) - Sample Complexity of Nonparametric Off-Policy Evaluation on
Low-Dimensional Manifolds using Deep Networks [71.95722100511627]
We consider the off-policy evaluation problem of reinforcement learning using deep neural networks.
We show that, by choosing network size appropriately, one can leverage the low-dimensional manifold structure in the Markov decision process.
arXiv Detail & Related papers (2022-06-06T20:25:20Z) - A Sharp Characterization of Linear Estimators for Offline Policy
Evaluation [33.37672297925897]
offline policy evaluation is a fundamental statistical problem in reinforcement learning.
We identify simple control-theoretic and linear-algebraic conditions that are necessary and sufficient for classical methods.
Our results provide a complete picture of the behavior of linear estimators for offline policy evaluation.
arXiv Detail & Related papers (2022-03-08T17:52:57Z) - Uncertainty-Based Offline Reinforcement Learning with Diversified
Q-Ensemble [16.92791301062903]
We propose an uncertainty-based offline RL method that takes into account the confidence of the Q-value prediction and does not require any estimation or sampling of the data distribution.
Surprisingly, we find that it is possible to substantially outperform existing offline RL methods on various tasks by simply increasing the number of Q-networks along with the clipped Q-learning.
arXiv Detail & Related papers (2021-10-04T16:40:13Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods.
Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.