Related papers: On the Convergence and Sample Complexity Analysis of Deep Q-Networks with $\epsilon$-Greedy Exploration

On the Convergence and Sample Complexity Analysis of Deep Q-Networks with $\epsilon$-Greedy Exploration

URL: http://arxiv.org/abs/2310.16173v1
Date: Tue, 24 Oct 2023 20:37:02 GMT
Title: On the Convergence and Sample Complexity Analysis of Deep Q-Networks with $\epsilon$-Greedy Exploration
Authors: Shuai Zhang, Hongkang Li, Meng Wang, Miao Liu, Pin-Yu Chen, Songtao Lu, Sijia Liu, Keerthiram Murugesan, Subhajit Chaudhury
Abstract summary: This paper provides a theoretical understanding of Deep Q-Network (DQN) with the $varepsilon$-greedy exploration in deep reinforcement learning.
Score: 86.71396285956044
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper provides a theoretical understanding of Deep Q-Network (DQN) with the $\varepsilon$-greedy exploration in deep reinforcement learning. Despite the tremendous empirical achievement of the DQN, its theoretical characterization remains underexplored. First, the exploration strategy is either impractical or ignored in the existing analysis. Second, in contrast to conventional Q-learning algorithms, the DQN employs the target network and experience replay to acquire an unbiased estimation of the mean-square Bellman error (MSBE) utilized in training the Q-network. However, the existing theoretical analysis of DQNs lacks convergence analysis or bypasses the technical challenges by deploying a significantly overparameterized neural network, which is not computationally efficient. This paper provides the first theoretical convergence and sample complexity analysis of the practical setting of DQNs with $\epsilon$-greedy policy. We prove an iterative procedure with decaying $\epsilon$ converges to the optimal Q-value function geometrically. Moreover, a higher level of $\epsilon$ values enlarges the region of convergence but slows down the convergence, while the opposite holds for a lower level of $\epsilon$ values. Experiments justify our established theoretical insights on DQNs.

Related papers

Optimization and generalization analysis for two-layer physics-informed neural networks without over-parametrization [0.6215404942415159]
This work focuses on the behavior of gradient descent (SGD) in solving least-squares regression with physics-informed neural networks (PINNs)<n>We show that if the network width exceeds a threshold that depends only on $epsilon$ and the problem, then the training loss and expected loss will decrease below $O(epsilon)$.
arXiv Detail & Related papers (2025-07-22T09:24:22Z)
Universal Approximation Theorem of Deep Q-Networks [2.1756081703276]
We analyze Deep Q-Networks (DQNs) via control and Forward-Backward Differential Equations (FBSDEs)<n>We show that DQNs can approximate the optimal Q-function on compact sets with arbitrary accuracy and high probability.<n>This work bridges deep reinforcement learning and control, offering insights into DQNs in continuous-time settings.
arXiv Detail & Related papers (2025-05-04T22:57:33Z)
Approximation to Deep Q-Network by Stochastic Delay Differential Equations [0.0]
We construct a differential delay equation based on the Deep Q-Network algorithm and estimate the Wasserstein-1 distance between them. We prove that the distance between the two converges to zero as the step size approaches zero. Specifically, the delay term in the equation, corresponding to the target network, contributes to the stability of the system.
arXiv Detail & Related papers (2025-05-01T08:19:24Z)
On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization [33.12181620473604]
We study a Fitted Q-Iteration with two-layer ReLU neural network parametrization, and find the sample complexity guarantees for the algorithm. We show that this approach achieves a sample complexity of $tildemathcalO (1/epsilon2)$, which is order-optimal.
arXiv Detail & Related papers (2022-11-14T19:00:24Z)
Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks [71.95722100511627]
We consider the off-policy evaluation problem of reinforcement learning using deep neural networks. We show that, by choosing network size appropriately, one can leverage the low-dimensional manifold structure in the Markov decision process.
arXiv Detail & Related papers (2022-06-06T20:25:20Z)
Does DQN Learn? [16.035744751431114]
We numerically show that Deep Q-Network (DQN) often returns a policy that performs worse than the initial one.<n>We offer a theoretical explanation for this phenomenon in linear DQN.
arXiv Detail & Related papers (2022-05-26T20:46:01Z)
How does unlabeled data improve generalization in self-training? A one-hidden-layer theoretical analysis [93.37576644429578]
This work establishes the first theoretical analysis for the known iterative self-training paradigm. We prove the benefits of unlabeled data in both training convergence and generalization ability. Experiments from shallow neural networks to deep neural networks are also provided to justify the correctness of our established theoretical insights on self-training.
arXiv Detail & Related papers (2022-01-21T02:16:52Z)
Toward Trainability of Deep Quantum Neural Networks [87.04438831673063]
Quantum Neural Networks (QNNs) with random structures have poor trainability due to the exponentially vanishing gradient as the circuit depth and the qubit number increase. We provide the first viable solution to the vanishing gradient problem for deep QNNs with theoretical guarantees.
arXiv Detail & Related papers (2021-12-30T10:27:08Z)
The edge of chaos: quantum field theory and deep neural networks [0.0]
We explicitly construct the quantum field theory corresponding to a general class of deep neural networks. We compute the loop corrections to the correlation function in a perturbative expansion in the ratio of depth $T$ to width $N$. Our analysis provides a first-principles approach to the rapidly emerging NN-QFT correspondence, and opens several interesting avenues to the study of criticality in deep neural networks.
arXiv Detail & Related papers (2021-09-27T18:00:00Z)
Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning [59.71676469100807]
This work sharpens the sample complexity of synchronous Q-learning to an order of $frac|mathcalS|| (1-gamma)4varepsilon2$ for any $0varepsilon 1$. Our finding unveils the effectiveness of vanilla Q-learning, which matches that of speedy Q-learning without requiring extra computation and storage.
arXiv Detail & Related papers (2021-02-12T14:22:05Z)
Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning. We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z)
Deep Q-Learning: Theoretical Insights from an Asymptotic Analysis [3.9871041399267613]
Deep Q-Learning is an important reinforcement learning algorithm, which involves training a deep neural network to approximate the well-known Q-function. Although wildly successful under laboratory conditions, serious gaps between theory and practice as well as a lack of formal guarantees prevent its use in the real world. We provide a theoretical analysis of a popular version of Deep Q-Learning under realistic verifiable assumptions.
arXiv Detail & Related papers (2020-08-25T07:59:20Z)
A Revision of Neural Tangent Kernel-based Approaches for Neural Networks [34.75076385561115]
We use the neural tangent kernel to show that networks can fit any finite training sample perfectly. A simple and analytic kernel function was derived as indeed equivalent to a fully-trained network. Our tighter analysis resolves the scaling problem and enables the validation of the original NTK-based results.
arXiv Detail & Related papers (2020-07-02T05:07:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.