Neural Network Approximation for Pessimistic Offline Reinforcement
Learning
- URL: http://arxiv.org/abs/2312.11863v1
- Date: Tue, 19 Dec 2023 05:17:27 GMT
- Title: Neural Network Approximation for Pessimistic Offline Reinforcement
Learning
- Authors: Di Wu, Yuling Jiao, Li Shen, Haizhao Yang, Xiliang Lu
- Abstract summary: We present a non-asymptotic estimation error of pessimistic offline RL using general neural network approximation.
Our result shows that the estimation error consists of two parts: the first converges to zero at a desired rate on the sample size with partially controllable concentrability, and the second becomes negligible if the residual constraint is tight.
- Score: 17.756108291816908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning (RL) has shown remarkable success in specific
offline decision-making scenarios, yet its theoretical guarantees are still
under development. Existing works on offline RL theory primarily emphasize a
few trivial settings, such as linear MDP or general function approximation with
strong assumptions and independent data, which lack guidance for practical use.
The coupling of deep learning and Bellman residuals makes this problem
challenging, in addition to the difficulty of data dependence. In this paper,
we establish a non-asymptotic estimation error of pessimistic offline RL using
general neural network approximation with $\mathcal{C}$-mixing data regarding
the structure of networks, the dimension of datasets, and the concentrability
of data coverage, under mild assumptions. Our result shows that the estimation
error consists of two parts: the first converges to zero at a desired rate on
the sample size with partially controllable concentrability, and the second
becomes negligible if the residual constraint is tight. This result
demonstrates the explicit efficiency of deep adversarial offline RL frameworks.
We utilize the empirical process tool for $\mathcal{C}$-mixing sequences and
the neural network approximation theory for the H\"{o}lder class to achieve
this. We also develop methods to bound the Bellman estimation error caused by
function approximation with empirical Bellman constraint perturbations.
Additionally, we present a result that lessens the curse of dimensionality
using data with low intrinsic dimensionality and function classes with low
complexity. Our estimation provides valuable insights into the development of
deep offline RL and guidance for algorithm model design.
Related papers
- Deep learning from strongly mixing observations: Sparse-penalized regularization and minimax optimality [0.0]
We consider sparse-penalized regularization for deep neural network predictor.
We deal with the squared and a broad class of loss functions.
arXiv Detail & Related papers (2024-06-12T15:21:51Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Understanding, Predicting and Better Resolving Q-Value Divergence in
Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training.
For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z) - On the Lipschitz Constant of Deep Networks and Double Descent [5.381801249240512]
Existing bounds on the generalization error of deep networks assume some form of smooth or bounded dependence on the input variable.
We present an extensive experimental study of the empirical Lipschitz constant of deep networks undergoing double descent.
arXiv Detail & Related papers (2023-01-28T23:22:49Z) - Distributionally Robust Offline Reinforcement Learning with Linear
Function Approximation [16.128778192359327]
We learn an RL agent with the historical data obtained from the source environment and optimize it to perform well in the perturbed one.
We prove our algorithm can achieve the suboptimality of $O(sqrtK)$ depending on the linear function dimension $d$.
arXiv Detail & Related papers (2022-09-14T13:17:59Z) - Distributionally Robust Model-Based Offline Reinforcement Learning with
Near-Optimal Sample Complexity [39.886149789339335]
offline reinforcement learning aims to learn to perform decision making from history data without active exploration.
Due to uncertainties and variabilities of the environment, it is critical to learn a robust policy that performs well even when the deployed environment deviates from the nominal one used to collect the history dataset.
We consider a distributionally robust formulation of offline RL, focusing on robust Markov decision processes with an uncertainty set specified by the Kullback-Leibler divergence in both finite-horizon and infinite-horizon settings.
arXiv Detail & Related papers (2022-08-11T11:55:31Z) - Pessimistic Q-Learning for Offline Reinforcement Learning: Towards
Optimal Sample Complexity [51.476337785345436]
We study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes.
A variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity.
arXiv Detail & Related papers (2022-02-28T15:39:36Z) - The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer
Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data.
We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.