Regularized Q-learning
- URL: http://arxiv.org/abs/2202.05404v7
- Date: Wed, 23 Oct 2024 01:23:04 GMT
- Title: Regularized Q-learning
- Authors: Han-Dong Lim, Donghwan Lee,
- Abstract summary: This paper develops a new Q-learning algorithm that converges when linear function approximation is used.
We experimentally show that it converges in environments where Q-learning with linear function approximation has known to diverge.
- Score: 6.663174194579773
- License:
- Abstract: Q-learning is widely used algorithm in reinforcement learning community. Under the lookup table setting, its convergence is well established. However, its behavior is known to be unstable with the linear function approximation case. This paper develops a new Q-learning algorithm that converges when linear function approximation is used. We prove that simply adding an appropriate regularization term ensures convergence of the algorithm. We prove its stability using a recent analysis tool based on switching system models. Moreover, we experimentally show that it converges in environments where Q-learning with linear function approximation has known to diverge. We also provide an error bound on the solution where the algorithm converges.
Related papers
- Constant Stepsize Q-learning: Distributional Convergence, Bias and
Extrapolation [27.17913040244775]
We study asynchronous Q-learning with constant stepsize, which is commonly used in practice for its fast convergence.
By connecting the constant stepsize Q-learning to a time-homogeneous chain, we show the distributional convergence of the iterates in distance.
We also establish a Central Limit Theory for Q-learning iterates, demonstrating the normality of the averaged iterates.
Specifically, the bias is proportional to the stepsize up to higher-order terms and we provide an explicit expression for the linear coefficient.
arXiv Detail & Related papers (2024-01-25T02:01:53Z) - Stability of Q-Learning Through Design and Optimism [0.0]
This paper is in part a tutorial on approximation and Q-learning.
It provides details regarding the INFORMS APS inaugural Applied Probability Trust Plenary Lecture, presented in Nancy France, June 2023.
The paper also presents new approaches to ensure stability and potentially accelerated convergence for these algorithms.
arXiv Detail & Related papers (2023-07-05T20:04:26Z) - Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - Iterative regularization in classification via hinge loss diagonal descent [12.684351703991965]
Iterative regularization is a classic idea in regularization theory, that has recently become popular in machine learning.
In this paper, we focus on iterative regularization in the context of classification.
arXiv Detail & Related papers (2022-12-24T07:15:26Z) - Online Regularized Learning Algorithm for Functional Data [2.5382095320488673]
This paper considers online regularized learning algorithm in Hilbert kernel spaces.
It shows that convergence rates of both prediction error and estimation error with constant step-size are competitive with those in the literature.
arXiv Detail & Related papers (2022-11-24T11:56:10Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Matrix Completion via Non-Convex Relaxation and Adaptive Correlation
Learning [90.8576971748142]
We develop a novel surrogate that can be optimized by closed-form solutions.
We exploit upperwise correlation for completion, and thus an adaptive correlation learning model.
arXiv Detail & Related papers (2022-03-04T08:50:50Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Optimal oracle inequalities for solving projected fixed-point equations [53.31620399640334]
We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space.
We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation.
arXiv Detail & Related papers (2020-12-09T20:19:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.