Understanding Deep Neural Function Approximation in Reinforcement
Learning via $\epsilon$-Greedy Exploration
- URL: http://arxiv.org/abs/2209.07376v1
- Date: Thu, 15 Sep 2022 15:42:47 GMT
- Title: Understanding Deep Neural Function Approximation in Reinforcement
Learning via $\epsilon$-Greedy Exploration
- Authors: Fanghui Liu, Luca Viano, Volkan Cevher
- Abstract summary: This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL)
We focus on the value based algorithm with the $epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces.
Our analysis reformulates the temporal difference error in an $L2(mathrmdmu)$-integrable space over a certain averaged measure $mu$, and transforms it to a generalization problem under the non-iid setting.
- Score: 53.90873926758026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper provides a theoretical study of deep neural function approximation
in reinforcement learning (RL) with the $\epsilon$-greedy exploration under the
online setting. This problem setting is motivated by the successful deep
Q-networks (DQN) framework that falls in this regime. In this work, we provide
an initial attempt on theoretical understanding deep RL from the perspective of
function class and neural networks architectures (e.g., width and depth) beyond
the "linear" regime. To be specific, we focus on the value based algorithm with
the $\epsilon$-greedy exploration via deep (and two-layer) neural networks
endowed by Besov (and Barron) function spaces, respectively, which aims at
approximating an $\alpha$-smooth Q-function in a $d$-dimensional feature space.
We prove that, with $T$ episodes, scaling the width $m =
\widetilde{\mathcal{O}}(T^{\frac{d}{2\alpha + d}})$ and the depth
$L=\mathcal{O}(\log T)$ of the neural network for deep RL is sufficient for
learning with sublinear regret in Besov spaces. Moreover, for a two layer
neural network endowed by the Barron space, scaling the width
$\Omega(\sqrt{T})$ is sufficient. To achieve this, the key issue in our
analysis is how to estimate the temporal difference error under deep neural
function approximation as the $\epsilon$-greedy exploration is not enough to
ensure "optimism". Our analysis reformulates the temporal difference error in
an $L^2(\mathrm{d}\mu)$-integrable space over a certain averaged measure $\mu$,
and transforms it to a generalization problem under the non-iid setting. This
might have its own interest in RL theory for better understanding
$\epsilon$-greedy exploration in deep RL.
Related papers
- Deep Neural Networks: Multi-Classification and Universal Approximation [0.0]
We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements.
We also provide depth estimates for approximating $W1,p$ functions and width estimates for approximating $Lp(Omega;mathbbRm)$ for $mgeq1$.
arXiv Detail & Related papers (2024-09-10T14:31:21Z) - Mathematical Models of Computation in Superposition [0.9374652839580183]
Superposition poses a serious challenge to mechanistically interpreting current AI systems.
We present mathematical models of emphcomputation in superposition, where superposition is actively helpful for efficiently accomplishing the task.
We conclude by providing some potential applications of our work for interpreting neural networks that implement computation in superposition.
arXiv Detail & Related papers (2024-08-10T06:11:48Z) - Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable.
We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Sample Complexity of Neural Policy Mirror Descent for Policy
Optimization on Low-Dimensional Manifolds [75.51968172401394]
We study the sample complexity of the neural policy mirror descent (NPMD) algorithm with deep convolutional neural networks (CNN)
In each iteration of NPMD, both the value function and the policy can be well approximated by CNNs.
We show that NPMD can leverage the low-dimensional structure of state space to escape from the curse of dimensionality.
arXiv Detail & Related papers (2023-09-25T07:31:22Z) - Rates of Approximation by ReLU Shallow Neural Networks [8.22379888383833]
We show that ReLU shallow neural networks with $m$ hidden neurons can uniformly approximate functions from the H"older space.
Such rates are very close to the optimal one $O(m-fracrd)$ in the sense that $fracd+2d+4d+4$ is close to $1$, when the dimension $d$ is large.
arXiv Detail & Related papers (2023-07-24T00:16:50Z) - An $L^2$ Analysis of Reinforcement Learning in High Dimensions with
Kernel and Neural Network Approximation [9.088303226909277]
This paper considers the situation where the function approximation is made using the kernel method or the two-layer neural network model.
We establish an $tildeO(H3|mathcal A|frac14n-frac14)$ bound for the optimal policy with $Hn$ samples.
Even though this result still requires a finite-sized action space, the error bound is independent of the dimensionality of the state space.
arXiv Detail & Related papers (2021-04-15T21:59:03Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z) - Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization
is Sufficient [9.309655246559094]
We show that any target network of width $d$ and depth $l$ can be approximated by pruning a random network that is a factor $O(log(dl))$ wider and twice as deep.
Our analysis relies on connecting pruning random ReLU networks to random instances of the textscSubset problem.
arXiv Detail & Related papers (2020-06-14T19:32:10Z) - Backward Feature Correction: How Deep Learning Performs Deep
(Hierarchical) Learning [66.05472746340142]
This paper analyzes how multi-layer neural networks can perform hierarchical learning _efficiently_ and _automatically_ by SGD on the training objective.
We establish a new principle called "backward feature correction", where the errors in the lower-level features can be automatically corrected when training together with the higher-level layers.
arXiv Detail & Related papers (2020-01-13T17:28:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.