On the Convergence and Sample Complexity Analysis of Deep Q-Networks
with $\epsilon$-Greedy Exploration
- URL: http://arxiv.org/abs/2310.16173v1
- Date: Tue, 24 Oct 2023 20:37:02 GMT
- Title: On the Convergence and Sample Complexity Analysis of Deep Q-Networks
with $\epsilon$-Greedy Exploration
- Authors: Shuai Zhang, Hongkang Li, Meng Wang, Miao Liu, Pin-Yu Chen, Songtao
Lu, Sijia Liu, Keerthiram Murugesan, Subhajit Chaudhury
- Abstract summary: This paper provides a theoretical understanding of Deep Q-Network (DQN) with the $varepsilon$-greedy exploration in deep reinforcement learning.
- Score: 86.71396285956044
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper provides a theoretical understanding of Deep Q-Network (DQN) with
the $\varepsilon$-greedy exploration in deep reinforcement learning. Despite
the tremendous empirical achievement of the DQN, its theoretical
characterization remains underexplored. First, the exploration strategy is
either impractical or ignored in the existing analysis. Second, in contrast to
conventional Q-learning algorithms, the DQN employs the target network and
experience replay to acquire an unbiased estimation of the mean-square Bellman
error (MSBE) utilized in training the Q-network. However, the existing
theoretical analysis of DQNs lacks convergence analysis or bypasses the
technical challenges by deploying a significantly overparameterized neural
network, which is not computationally efficient. This paper provides the first
theoretical convergence and sample complexity analysis of the practical setting
of DQNs with $\epsilon$-greedy policy. We prove an iterative procedure with
decaying $\epsilon$ converges to the optimal Q-value function geometrically.
Moreover, a higher level of $\epsilon$ values enlarges the region of
convergence but slows down the convergence, while the opposite holds for a
lower level of $\epsilon$ values. Experiments justify our established
theoretical insights on DQNs.
Related papers
- On the Global Convergence of Fitted Q-Iteration with Two-layer Neural
Network Parametrization [33.12181620473604]
We study a Fitted Q-Iteration with two-layer ReLU neural network parametrization, and find the sample complexity guarantees for the algorithm.
We show that this approach achieves a sample complexity of $tildemathcalO (1/epsilon2)$, which is order-optimal.
arXiv Detail & Related papers (2022-11-14T19:00:24Z) - Sample Complexity of Nonparametric Off-Policy Evaluation on
Low-Dimensional Manifolds using Deep Networks [71.95722100511627]
We consider the off-policy evaluation problem of reinforcement learning using deep neural networks.
We show that, by choosing network size appropriately, one can leverage the low-dimensional manifold structure in the Markov decision process.
arXiv Detail & Related papers (2022-06-06T20:25:20Z) - How does unlabeled data improve generalization in self-training? A
one-hidden-layer theoretical analysis [93.37576644429578]
This work establishes the first theoretical analysis for the known iterative self-training paradigm.
We prove the benefits of unlabeled data in both training convergence and generalization ability.
Experiments from shallow neural networks to deep neural networks are also provided to justify the correctness of our established theoretical insights on self-training.
arXiv Detail & Related papers (2022-01-21T02:16:52Z) - Toward Trainability of Deep Quantum Neural Networks [87.04438831673063]
Quantum Neural Networks (QNNs) with random structures have poor trainability due to the exponentially vanishing gradient as the circuit depth and the qubit number increase.
We provide the first viable solution to the vanishing gradient problem for deep QNNs with theoretical guarantees.
arXiv Detail & Related papers (2021-12-30T10:27:08Z) - The edge of chaos: quantum field theory and deep neural networks [0.0]
We explicitly construct the quantum field theory corresponding to a general class of deep neural networks.
We compute the loop corrections to the correlation function in a perturbative expansion in the ratio of depth $T$ to width $N$.
Our analysis provides a first-principles approach to the rapidly emerging NN-QFT correspondence, and opens several interesting avenues to the study of criticality in deep neural networks.
arXiv Detail & Related papers (2021-09-27T18:00:00Z) - Tightening the Dependence on Horizon in the Sample Complexity of
Q-Learning [59.71676469100807]
This work sharpens the sample complexity of synchronous Q-learning to an order of $frac|mathcalS|| (1-gamma)4varepsilon2$ for any $0varepsilon 1$.
Our finding unveils the effectiveness of vanilla Q-learning, which matches that of speedy Q-learning without requiring extra computation and storage.
arXiv Detail & Related papers (2021-02-12T14:22:05Z) - Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning.
We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z) - Deep Q-Learning: Theoretical Insights from an Asymptotic Analysis [3.9871041399267613]
Deep Q-Learning is an important reinforcement learning algorithm, which involves training a deep neural network to approximate the well-known Q-function.
Although wildly successful under laboratory conditions, serious gaps between theory and practice as well as a lack of formal guarantees prevent its use in the real world.
We provide a theoretical analysis of a popular version of Deep Q-Learning under realistic verifiable assumptions.
arXiv Detail & Related papers (2020-08-25T07:59:20Z) - A Revision of Neural Tangent Kernel-based Approaches for Neural Networks [34.75076385561115]
We use the neural tangent kernel to show that networks can fit any finite training sample perfectly.
A simple and analytic kernel function was derived as indeed equivalent to a fully-trained network.
Our tighter analysis resolves the scaling problem and enables the validation of the original NTK-based results.
arXiv Detail & Related papers (2020-07-02T05:07:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.