Universal Approximation Theorem for Deep Q-Learning via FBSDE System
- URL: http://arxiv.org/abs/2505.06023v1
- Date: Fri, 09 May 2025 13:11:55 GMT
- Title: Universal Approximation Theorem for Deep Q-Learning via FBSDE System
- Authors: Qian Qi,
- Abstract summary: This paper establishes a Universal Approximation Theorem for a class of Deep Q-Networks (DQNs)<n>We show that layers of a deep residual network, conceived as neural operators acting on function spaces, can approximate the action of the Bellman operator.
- Score: 2.1756081703276
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The approximation capabilities of Deep Q-Networks (DQNs) are commonly justified by general Universal Approximation Theorems (UATs) that do not leverage the intrinsic structural properties of the optimal Q-function, the solution to a Bellman equation. This paper establishes a UAT for a class of DQNs whose architecture is designed to emulate the iterative refinement process inherent in Bellman updates. A central element of our analysis is the propagation of regularity: while the transformation induced by a single Bellman operator application exhibits regularity, for which Backward Stochastic Differential Equations (BSDEs) theory provides analytical tools, the uniform regularity of the entire sequence of value iteration iterates--specifically, their uniform Lipschitz continuity on compact domains under standard Lipschitz assumptions on the problem data--is derived from finite-horizon dynamic programming principles. We demonstrate that layers of a deep residual network, conceived as neural operators acting on function spaces, can approximate the action of the Bellman operator. The resulting approximation theorem is thus intrinsically linked to the control problem's structure, offering a proof technique wherein network depth directly corresponds to iterations of value function refinement, accompanied by controlled error propagation. This perspective reveals a dynamic systems view of the network's operation on a space of value functions.
Related papers
- Contraction, Criticality, and Capacity: A Dynamical-Systems Perspective on Echo-State Networks [13.857230672081489]
We present a unified, dynamical-systems treatment that weaves together functional analysis, random attractor theory and recent neuroscientific findings.<n>First, we prove that the Echo-State Property (wash-out of initial conditions) together with global Lipschitz dynamics necessarily yields the Fading-Memory Property.<n>Second, employing a Stone-Weierstrass strategy we give a streamlined proof that ESNs with nonlinear reservoirs and linear read-outs are dense in the Banach space of causal, time-in fading-memory filters.<n>Third, we quantify computational resources via memory-capacity spectrum, show how
arXiv Detail & Related papers (2025-07-24T14:41:18Z) - Spectral Bellman Method: Unifying Representation and Exploration in RL [54.71169912483302]
This work introduces Spectral Bellman Representation, a novel framework for learning representations for value-based reinforcement learning.<n>We show that our learned representations enable structured exploration, by aligning feature covariance with Bellman dynamics.<n>Our framework naturally extends to powerful multi-step Bellman operators, further broadening its impact.
arXiv Detail & Related papers (2025-07-17T14:50:52Z) - Neural Expectation Operators [2.1756081703276]
This paper introduces textbfMeasure Learning, a paradigm for modeling ambiguity via non-linear expectations.<n>We define Neural Expectation Operators as solutions to Backward Differential Equations (BSDEs) whose drivers are parameterized by neural networks.<n>We provide constructive methods for enforcing key axiomatic properties, such as convexity, by architectural design.
arXiv Detail & Related papers (2025-07-13T06:19:28Z) - Neural Hamiltonian Operator [2.1756081703276]
An alternative to traditional dynamic programming is Pontryagin's Maximum Principle (PMP), which recasts the problem as a system of Forward-Backward Differential Equations (FBSDEs)<n>In this paper, we introduce a formal framework for solving such problems with deep learning by defining a textbfNeural Hamiltonian Operator (NHO).<n>We show how the optimal NHO can be found by training the underlying networks to enforce the consistency conditions dictated by the PMP.
arXiv Detail & Related papers (2025-07-02T02:56:49Z) - Generalization Bounds and Model Complexity for Kolmogorov-Arnold Networks [1.5850926890180461]
Kolmogorov-Arnold Network (KAN) is a network structure recently proposed by Liu et al.<n>Work provides a rigorous theoretical analysis of KAN by establishing generalization bounds for KAN equipped with activation functions.
arXiv Detail & Related papers (2024-10-10T15:23:21Z) - Generalization of Scaled Deep ResNets in the Mean-Field Regime [55.77054255101667]
We investigate emphscaled ResNet in the limit of infinitely deep and wide neural networks.
Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime.
arXiv Detail & Related papers (2024-03-14T21:48:00Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Lipschitz constant estimation for 1D convolutional neural networks [0.0]
We propose a dissipativity-based method for Lipschitz constant estimation of 1D convolutional neural networks (CNNs)
In particular, we analyze the dissipativity properties of convolutional, pooling, and fully connected layers.
arXiv Detail & Related papers (2022-11-28T12:09:06Z) - Universality and approximation bounds for echo state networks with
random weights [5.310194156119441]
We study the uniform approximation of echo state networks with randomly generated internal weights.
Recent results showed that echo state networks with ReLU activation are universal.
arXiv Detail & Related papers (2022-06-12T05:43:16Z) - Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency [111.83670279016599]
We study reinforcement learning for partially observed decision processes (POMDPs) with infinite observation and state spaces.
We make the first attempt at partial observability and function approximation for a class of POMDPs with a linear structure.
arXiv Detail & Related papers (2022-04-20T21:15:38Z) - Path Regularization: A Convexity and Sparsity Inducing Regularization
for Parallel ReLU Networks [75.33431791218302]
We study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape.
We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases.
arXiv Detail & Related papers (2021-10-18T18:00:36Z) - Bayesian Bellman Operators [55.959376449737405]
We introduce a novel perspective on Bayesian reinforcement learning (RL)
Our framework is motivated by the insight that when bootstrapping is introduced, model-free approaches actually infer a posterior over Bellman operators, not value functions.
arXiv Detail & Related papers (2021-06-09T12:20:46Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.