Sample Complexity of Neural Policy Mirror Descent for Policy
Optimization on Low-Dimensional Manifolds
- URL: http://arxiv.org/abs/2309.13915v2
- Date: Sun, 14 Jan 2024 23:46:34 GMT
- Title: Sample Complexity of Neural Policy Mirror Descent for Policy
Optimization on Low-Dimensional Manifolds
- Authors: Zhenghao Xu, Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao
- Abstract summary: We study the sample complexity of the neural policy mirror descent (NPMD) algorithm with deep convolutional neural networks (CNN)
In each iteration of NPMD, both the value function and the policy can be well approximated by CNNs.
We show that NPMD can leverage the low-dimensional structure of state space to escape from the curse of dimensionality.
- Score: 75.51968172401394
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Policy gradient methods equipped with deep neural networks have achieved
great success in solving high-dimensional reinforcement learning (RL) problems.
However, current analyses cannot explain why they are resistant to the curse of
dimensionality. In this work, we study the sample complexity of the neural
policy mirror descent (NPMD) algorithm with deep convolutional neural networks
(CNN). Motivated by the empirical observation that many high-dimensional
environments have state spaces possessing low-dimensional structures, such as
those taking images as states, we consider the state space to be a
$d$-dimensional manifold embedded in the $D$-dimensional Euclidean space with
intrinsic dimension $d\ll D$. We show that in each iteration of NPMD, both the
value function and the policy can be well approximated by CNNs. The
approximation errors are controlled by the size of the networks, and the
smoothness of the previous networks can be inherited. As a result, by properly
choosing the network size and hyperparameters, NPMD can find an
$\epsilon$-optimal policy with $\widetilde{O}(\epsilon^{-\frac{d}{\alpha}-2})$
samples in expectation, where $\alpha\in(0,1]$ indicates the smoothness of
environment. Compared to previous work, our result exhibits that NPMD can
leverage the low-dimensional structure of state space to escape from the curse
of dimensionality, explaining the efficacy of deep policy gradient algorithms.
Related papers
- Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Effective Minkowski Dimension of Deep Nonparametric Regression: Function
Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures.
This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z) - High-Dimensional Smoothed Entropy Estimation via Dimensionality
Reduction [14.53979700025531]
We consider the estimation of the differential entropy $h(X+Z)$ via $n$ independently and identically distributed samples of $X$.
Under the absolute-error loss, the above problem has a parametric estimation rate of $fraccDsqrtn$.
We overcome this exponential sample complexity by projecting $X$ to a low-dimensional space via principal component analysis (PCA) before the entropy estimation.
arXiv Detail & Related papers (2023-05-08T13:51:48Z) - Sharp Lower Bounds on Interpolation by Deep ReLU Neural Networks at
Irregularly Spaced Data [2.7195102129095003]
Deep ReLU neural networks can interpolate values at $N$ datapoints which are separated by a distance $delta$.
We show that $Omega(N)$ parameters are required in the regime where $delta$ is exponentially small in $N$.
As an application we give a lower bound on the approximation rates that deep ReLU neural networks can achieve for Sobolev spaces at the embedding endpoint.
arXiv Detail & Related papers (2023-02-02T02:46:20Z) - Deep neural network expressivity for optimal stopping problems [2.741266294612776]
An optimal stopping problem can be approximated with error at most $varepsilon$ by a deep ReLU neural network of size at most $kappa dmathfrakq varepsilon-mathfrakr$.
This proves that deep neural networks do not suffer from the curse of dimensionality when employed to solve optimal stopping problems.
arXiv Detail & Related papers (2022-10-19T10:22:35Z) - Understanding Deep Neural Function Approximation in Reinforcement
Learning via $\epsilon$-Greedy Exploration [53.90873926758026]
This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL)
We focus on the value based algorithm with the $epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces.
Our analysis reformulates the temporal difference error in an $L2(mathrmdmu)$-integrable space over a certain averaged measure $mu$, and transforms it to a generalization problem under the non-iid setting.
arXiv Detail & Related papers (2022-09-15T15:42:47Z) - Sample Complexity of Nonparametric Off-Policy Evaluation on
Low-Dimensional Manifolds using Deep Networks [71.95722100511627]
We consider the off-policy evaluation problem of reinforcement learning using deep neural networks.
We show that, by choosing network size appropriately, one can leverage the low-dimensional manifold structure in the Markov decision process.
arXiv Detail & Related papers (2022-06-06T20:25:20Z) - Besov Function Approximation and Binary Classification on
Low-Dimensional Manifolds Using Convolutional Residual Networks [42.43493635899849]
We establish theoretical guarantees of convolutional residual networks (ConvResNet) in terms of function approximation and statistical estimation for binary classification.
Our results demonstrate that ConvResNets are adaptive to low-dimensional structures of data sets.
arXiv Detail & Related papers (2021-09-07T02:58:11Z) - Online Limited Memory Neural-Linear Bandits with Likelihood Matching [53.18698496031658]
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.
We propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
arXiv Detail & Related papers (2021-02-07T14:19:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.