Related papers: Understanding the theoretical properties of projected Bellman equation, linear Q-learning, and approximate value iteration

Understanding the theoretical properties of projected Bellman equation, linear Q-learning, and approximate value iteration

URL: http://arxiv.org/abs/2504.10865v1
Date: Tue, 15 Apr 2025 04:56:33 GMT
Title: Understanding the theoretical properties of projected Bellman equation, linear Q-learning, and approximate value iteration
Authors: Han-Dong Lim, Donghwan Lee,
Abstract summary: We study the theoretical properties of the projected Bellman equation (PBE) and two algorithms to solve this equation.<n>We consider two sufficient conditions for the existence of a solution to PBE.
Score: 6.663174194579773
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we study the theoretical properties of the projected Bellman equation (PBE) and two algorithms to solve this equation: linear Q-learning and approximate value iteration (AVI). We consider two sufficient conditions for the existence of a solution to PBE : strictly negatively row dominating diagonal (SNRDD) assumption and a condition motivated by the convergence of AVI. The SNRDD assumption also ensures the convergence of linear Q-learning, and its relationship with the convergence of AVI is examined. Lastly, several interesting observations on the solution of PBE are provided when using $\epsilon$-greedy policy.

Related papers

Regularized Q-learning through Robust Averaging [3.4354636842203026]
We propose a new Q-learning variant, called 2RA Q-learning, that addresses some weaknesses of existing Q-learning methods in a principled manner. One such weakness is an underlying estimation bias which cannot be controlled and often results in poor performance. We show that 2RA Q-learning converges to the optimal policy and analyze its theoretical mean-squared error.
arXiv Detail & Related papers (2024-05-03T15:57:26Z)
On the Convergence and Sample Complexity Analysis of Deep Q-Networks with $\epsilon$-Greedy Exploration [86.71396285956044]
This paper provides a theoretical understanding of Deep Q-Network (DQN) with the $varepsilon$-greedy exploration in deep reinforcement learning.
arXiv Detail & Related papers (2023-10-24T20:37:02Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Convex Q Learning in a Stochastic Environment: Extended Version [1.680268810119084]
The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation. The proposed algorithms are convergent and new techniques are introduced to obtain the rate of convergence in a mean-square sense. The theory is illustrated with an application to a classical inventory control problem.
arXiv Detail & Related papers (2023-09-10T18:24:43Z)
Learning Bellman Complete Representations for Offline Policy Evaluation [51.96704525783913]
Two sufficient conditions for sample-efficient OPE are Bellman completeness and coverage. We show our representation enables better OPE compared to previous representation learning methods developed for off-policy RL.
arXiv Detail & Related papers (2022-07-12T21:02:02Z)
Solving Constrained Variational Inequalities via an Interior Point Method [88.39091990656107]
We develop an interior-point approach to solve constrained variational inequality (cVI) problems. We provide convergence guarantees for ACVI in two general classes of problems. Unlike previous work in this setting, ACVI provides a means to solve cVIs when the constraints are nontrivial.
arXiv Detail & Related papers (2022-06-21T17:55:13Z)
Sparsest Univariate Learning Models Under Lipschitz Constraint [31.28451181040038]
We propose continuous-domain formulations for one-dimensional regression problems. We control the Lipschitz constant explicitly using a user-defined upper-bound. We show that both problems admit global minimizers that are continuous and piecewise-linear.
arXiv Detail & Related papers (2021-12-27T07:03:43Z)
Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results. We present novel methods Q-Rex and Q-RexDaRe. We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z)
Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning [59.71676469100807]
This work sharpens the sample complexity of synchronous Q-learning to an order of $frac|mathcalS|| (1-gamma)4varepsilon2$ for any $0varepsilon 1$. Our finding unveils the effectiveness of vanilla Q-learning, which matches that of speedy Q-learning without requiring extra computation and storage.
arXiv Detail & Related papers (2021-02-12T14:22:05Z)
The Mean-Squared Error of Double Q-Learning [33.610380346576456]
We establish a theoretical comparison between the mean-squared error of Double Q-learning and Q-learning. We show that the mean-squared error of Double Q-learning is exactly equal to that of Q-learning.
arXiv Detail & Related papers (2020-07-09T19:15:17Z)
Finite-Sample Analysis of Stochastic Approximation Using Smooth Convex Envelopes [40.31139355952393]
We construct a smooth Lyapunov function using the generalized envelope, and show that the iterates of SA have negative drift with respect to that Lyapunov function. In particular, we use it to establish the first-known convergence rate of the V-trace algorithm for off-policy TD-learning. We also use it to study TD-learning in the on-policy setting, and recover the existing state-of-the-art results for $Q$-learning.
arXiv Detail & Related papers (2020-02-03T16:42:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.