Neural Network Approximation for Pessimistic Offline Reinforcement
  Learning
        - URL: http://arxiv.org/abs/2312.11863v1
- Date: Tue, 19 Dec 2023 05:17:27 GMT
- Title: Neural Network Approximation for Pessimistic Offline Reinforcement
  Learning
- Authors: Di Wu, Yuling Jiao, Li Shen, Haizhao Yang, Xiliang Lu
- Abstract summary: We present a non-asymptotic estimation error of pessimistic offline RL using general neural network approximation.
Our result shows that the estimation error consists of two parts: the first converges to zero at a desired rate on the sample size with partially controllable concentrability, and the second becomes negligible if the residual constraint is tight.
- Score: 17.756108291816908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Deep reinforcement learning (RL) has shown remarkable success in specific
offline decision-making scenarios, yet its theoretical guarantees are still
under development. Existing works on offline RL theory primarily emphasize a
few trivial settings, such as linear MDP or general function approximation with
strong assumptions and independent data, which lack guidance for practical use.
The coupling of deep learning and Bellman residuals makes this problem
challenging, in addition to the difficulty of data dependence. In this paper,
we establish a non-asymptotic estimation error of pessimistic offline RL using
general neural network approximation with $\mathcal{C}$-mixing data regarding
the structure of networks, the dimension of datasets, and the concentrability
of data coverage, under mild assumptions. Our result shows that the estimation
error consists of two parts: the first converges to zero at a desired rate on
the sample size with partially controllable concentrability, and the second
becomes negligible if the residual constraint is tight. This result
demonstrates the explicit efficiency of deep adversarial offline RL frameworks.
We utilize the empirical process tool for $\mathcal{C}$-mixing sequences and
the neural network approximation theory for the H\"{o}lder class to achieve
this. We also develop methods to bound the Bellman estimation error caused by
function approximation with empirical Bellman constraint perturbations.
Additionally, we present a result that lessens the curse of dimensionality
using data with low intrinsic dimensionality and function classes with low
complexity. Our estimation provides valuable insights into the development of
deep offline RL and guidance for algorithm model design.
 
      
        Related papers
        - Interpretable Deep Regression Models with Interval-Censored Failure Time   Data [1.2993568435938014]
 Deep learning methods for interval-censored data remain underexplored and limited to specific data type or model.
This work proposes a general regression framework for interval-censored data with a broad class of partially linear transformation models.
Applying our method to the Alzheimer's Disease Neuroimaging Initiative dataset yields novel insights and improved predictive performance compared to traditional approaches.
 arXiv  Detail & Related papers  (2025-03-25T15:27:32Z)
- Deep learning from strongly mixing observations: Sparse-penalized   regularization and minimax optimality [0.0]
 We consider sparse-penalized regularization for deep neural network predictor.
We deal with the squared and a broad class of loss functions.
 arXiv  Detail & Related papers  (2024-06-12T15:21:51Z)
- Minimax Optimal and Computationally Efficient Algorithms for   Distributionally Robust Offline Reinforcement Learning [6.969949986864736]
 Distributionally robust offline reinforcement learning (RL) seeks robust policy training against environment perturbation by modeling dynamics uncertainty.
We propose minimax optimal and computationally efficient algorithms realizing function approximation.
Our results uncover that function approximation in robust offline RL is essentially distinct from and probably harder than that in standard offline RL.
 arXiv  Detail & Related papers  (2024-03-14T17:55:10Z)
- A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
 We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
 arXiv  Detail & Related papers  (2023-11-13T01:48:08Z)
- Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
 This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
 arXiv  Detail & Related papers  (2023-10-20T12:45:12Z)
- Understanding, Predicting and Better Resolving Q-Value Divergence in
  Offline-RL [86.0987896274354]
 We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training.
For the first time, our theory can reliably decide whether the training will diverge at an early stage.
 arXiv  Detail & Related papers  (2023-10-06T17:57:44Z)
- Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural
  Networks [89.28881869440433]
 This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs)
It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
 arXiv  Detail & Related papers  (2023-02-06T16:54:20Z)
- On the Lipschitz Constant of Deep Networks and Double Descent [5.381801249240512]
 Existing bounds on the generalization error of deep networks assume some form of smooth or bounded dependence on the input variable.
We present an extensive experimental study of the empirical Lipschitz constant of deep networks undergoing double descent.
 arXiv  Detail & Related papers  (2023-01-28T23:22:49Z)
- Distributionally Robust Offline Reinforcement Learning with Linear
  Function Approximation [16.128778192359327]
 We learn an RL agent with the historical data obtained from the source environment and optimize it to perform well in the perturbed one.
We prove our algorithm can achieve the suboptimality of $O(sqrtK)$ depending on the linear function dimension $d$.
 arXiv  Detail & Related papers  (2022-09-14T13:17:59Z)
- Distributionally Robust Model-Based Offline Reinforcement Learning with
  Near-Optimal Sample Complexity [39.886149789339335]
 offline reinforcement learning aims to learn to perform decision making from history data without active exploration.
Due to uncertainties and variabilities of the environment, it is critical to learn a robust policy that performs well even when the deployed environment deviates from the nominal one used to collect the history dataset.
We consider a distributionally robust formulation of offline RL, focusing on robust Markov decision processes with an uncertainty set specified by the Kullback-Leibler divergence in both finite-horizon and infinite-horizon settings.
 arXiv  Detail & Related papers  (2022-08-11T11:55:31Z)
- Pessimism in the Face of Confounders: Provably Efficient Offline   Reinforcement Learning in Partially Observable Markov Decision Processes [99.26864533035454]
 We study offline reinforcement learning (RL) in partially observable Markov decision processes.
We propose the underlineProxy variable underlinePessimistic underlinePolicy underlineOptimization (textttP3O) algorithm.
textttP3O is the first provably efficient offline RL algorithm for POMDPs with a confounded dataset.
 arXiv  Detail & Related papers  (2022-05-26T19:13:55Z)
- Pessimistic Q-Learning for Offline Reinforcement Learning: Towards
  Optimal Sample Complexity [51.476337785345436]
 We study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes.
A variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity.
 arXiv  Detail & Related papers  (2022-02-28T15:39:36Z)
- The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer
  Linear Networks [51.1848572349154]
 neural network models that perfectly fit noisy data can generalize well to unseen test data.
We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
 arXiv  Detail & Related papers  (2021-08-25T22:01:01Z)
- Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
 Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
 arXiv  Detail & Related papers  (2021-06-06T19:08:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.