Universal Approximation Theorem of Deep Q-Networks
- URL: http://arxiv.org/abs/2505.02288v1
- Date: Sun, 04 May 2025 22:57:33 GMT
- Title: Universal Approximation Theorem of Deep Q-Networks
- Authors: Qian Qi,
- Abstract summary: We analyze Deep Q-Networks (DQNs) via control and Forward-Backward Differential Equations (FBSDEs)<n>We show that DQNs can approximate the optimal Q-function on compact sets with arbitrary accuracy and high probability.<n>This work bridges deep reinforcement learning and control, offering insights into DQNs in continuous-time settings.
- Score: 2.1756081703276
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We establish a continuous-time framework for analyzing Deep Q-Networks (DQNs) via stochastic control and Forward-Backward Stochastic Differential Equations (FBSDEs). Considering a continuous-time Markov Decision Process (MDP) driven by a square-integrable martingale, we analyze DQN approximation properties. We show that DQNs can approximate the optimal Q-function on compact sets with arbitrary accuracy and high probability, leveraging residual network approximation theorems and large deviation bounds for the state-action process. We then analyze the convergence of a general Q-learning algorithm for training DQNs in this setting, adapting stochastic approximation theorems. Our analysis emphasizes the interplay between DQN layer count, time discretization, and the role of viscosity solutions (primarily for the value function $V^*$) in addressing potential non-smoothness of the optimal Q-function. This work bridges deep reinforcement learning and stochastic control, offering insights into DQNs in continuous-time settings, relevant for applications with physical systems or high-frequency data.
Related papers
- Approximation to Deep Q-Network by Stochastic Delay Differential Equations [0.0]
We construct a differential delay equation based on the Deep Q-Network algorithm and estimate the Wasserstein-1 distance between them.<n>We prove that the distance between the two converges to zero as the step size approaches zero.<n>Specifically, the delay term in the equation, corresponding to the target network, contributes to the stability of the system.
arXiv Detail & Related papers (2025-05-01T08:19:24Z) - Stochastic Q-learning for Large Discrete Action Spaces [79.1700188160944]
In complex environments with discrete action spaces, effective decision-making is critical in reinforcement learning (RL)
We present value-based RL approaches which, as opposed to optimizing over the entire set of $n$ actions, only consider a variable set of actions, possibly as small as $mathcalO(log(n)$)$.
The presented value-based RL methods include, among others, Q-learning, StochDQN, StochDDQN, all of which integrate this approach for both value-function updates and action selection.
arXiv Detail & Related papers (2024-05-16T17:58:44Z) - Pointer Networks with Q-Learning for Combinatorial Optimization [55.2480439325792]
We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets)
Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.
arXiv Detail & Related papers (2023-11-05T12:03:58Z) - On the Convergence and Sample Complexity Analysis of Deep Q-Networks
with $\epsilon$-Greedy Exploration [86.71396285956044]
This paper provides a theoretical understanding of Deep Q-Network (DQN) with the $varepsilon$-greedy exploration in deep reinforcement learning.
arXiv Detail & Related papers (2023-10-24T20:37:02Z) - An Analysis of Quantile Temporal-Difference Learning [53.36758478669685]
quantile temporal-difference learning (QTD) has proven to be a key component in several successful large-scale applications of reinforcement learning.
Unlike classical TD learning, QTD updates do not approximate contraction mappings, are highly non-linear, and may have multiple fixed points.
This paper is a proof of convergence to the fixed points of a related family of dynamic programming procedures with probability 1.
arXiv Detail & Related papers (2023-01-11T13:41:56Z) - Optimal time for sensing in open quantum systems [0.0]
We study the time-dependent quantum Fisher information (QFI) in an open quantum system satisfying the Gorini-Kossakowski-Sudarshan-Lindblad master equation.
arXiv Detail & Related papers (2022-10-19T23:25:19Z) - Hamilton-Jacobi Deep Q-Learning for Deterministic Continuous-Time
Systems with Lipschitz Continuous Controls [2.922007656878633]
We propose Q-learning algorithms for continuous-time deterministic optimal control problems with Lipschitz continuous controls.
A novel semi-discrete version of the HJB equation is proposed to design a Q-learning algorithm that uses data collected in discrete time without discretizing or approximating the system dynamics.
arXiv Detail & Related papers (2020-10-27T06:11:04Z) - Characterizing the loss landscape of variational quantum circuits [77.34726150561087]
We introduce a way to compute the Hessian of the loss function of VQCs.
We show how this information can be interpreted and compared to classical neural networks.
arXiv Detail & Related papers (2020-08-06T17:48:12Z) - Momentum Q-learning with Finite-Sample Convergence Guarantee [49.38471009162477]
This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee.
We establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling.
We demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.
arXiv Detail & Related papers (2020-07-30T12:27:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.