Stability of Q-Learning Through Design and Optimism
- URL: http://arxiv.org/abs/2307.02632v2
- Date: Mon, 21 Aug 2023 14:14:04 GMT
- Title: Stability of Q-Learning Through Design and Optimism
- Authors: Sean Meyn
- Abstract summary: This paper is in part a tutorial on approximation and Q-learning.
It provides details regarding the INFORMS APS inaugural Applied Probability Trust Plenary Lecture, presented in Nancy France, June 2023.
The paper also presents new approaches to ensure stability and potentially accelerated convergence for these algorithms.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Q-learning has become an important part of the reinforcement learning toolkit
since its introduction in the dissertation of Chris Watkins in the 1980s. The
purpose of this paper is in part a tutorial on stochastic approximation and
Q-learning, providing details regarding the INFORMS APS inaugural Applied
Probability Trust Plenary Lecture, presented in Nancy France, June 2023.
The paper also presents new approaches to ensure stability and potentially
accelerated convergence for these algorithms, and stochastic approximation in
other settings. Two contributions are entirely new:
1. Stability of Q-learning with linear function approximation has been an
open topic for research for over three decades. It is shown that with
appropriate optimistic training in the form of a modified Gibbs policy, there
exists a solution to the projected Bellman equation, and the algorithm is
stable (in terms of bounded parameter estimates). Convergence remains one of
many open topics for research.
2. The new Zap Zero algorithm is designed to approximate the Newton-Raphson
flow without matrix inversion. It is stable and convergent under mild
assumptions on the mean flow vector field for the algorithm, and compatible
statistical assumption on an underlying Markov chain. The algorithm is a
general approach to stochastic approximation which in particular applies to
Q-learning with "oblivious" training even with non-linear function
approximation.
Related papers
- Two-Step Q-Learning [0.0]
The paper proposes a novel off-policy two-step Q-learning algorithm, without importance sampling.
Numerical experiments demonstrate the superior performance of both the two-step Q-learning and its smooth variants.
arXiv Detail & Related papers (2024-07-02T15:39:00Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Convex Q Learning in a Stochastic Environment: Extended Version [1.680268810119084]
The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation.
The proposed algorithms are convergent and new techniques are introduced to obtain the rate of convergence in a mean-square sense.
The theory is illustrated with an application to a classical inventory control problem.
arXiv Detail & Related papers (2023-09-10T18:24:43Z) - Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - Online Learning Under A Separable Stochastic Approximation Framework [20.26530917721778]
We propose an online learning algorithm for a class of machine learning models under a separable approximation framework.
We show that the proposed algorithm produces more robust and test performance when compared to other popular learning algorithms.
arXiv Detail & Related papers (2023-05-12T13:53:03Z) - Can Decentralized Stochastic Minimax Optimization Algorithms Converge
Linearly for Finite-Sum Nonconvex-Nonconcave Problems? [56.62372517641597]
Decentralized minimax optimization has been actively studied in the past few years due to its application in a wide range machine learning.
This paper develops two novel decentralized minimax optimization algorithms for the non-strongly-nonconcave problem.
arXiv Detail & Related papers (2023-04-24T02:19:39Z) - Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features [65.64276393443346]
The Frank-Wolfe (FW) method is a popular approach for solving optimization problems with structured constraints.
We present two new variants of the algorithms for minimization of the finite-sum gradient.
arXiv Detail & Related papers (2023-04-23T20:05:09Z) - q-Learning in Continuous Time [1.4213973379473654]
We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation.
We develop a q-learning" theory around the q-function that is independent of time discretization.
We devise different actor-critic algorithms for solving underlying problems.
arXiv Detail & Related papers (2022-07-02T02:20:41Z) - Convex Q-Learning, Part 1: Deterministic Optimal Control [5.685589351789462]
It is well known that the extension of Watkins' algorithm to general function approximation settings is challenging.
The paper begins with a brief survey of linear programming approaches to optimal control, leading to a particular over parameterization that lends itself to applications in reinforcement learning.
It is shown that in fact the algorithms are very different: while convex Q-learning solves a convex program that approximates the Bellman equation, theory for DQN is no stronger than for Watkins' algorithm with function approximation.
arXiv Detail & Related papers (2020-08-08T17:17:42Z) - Momentum Q-learning with Finite-Sample Convergence Guarantee [49.38471009162477]
This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee.
We establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling.
We demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.
arXiv Detail & Related papers (2020-07-30T12:27:03Z) - A Dynamical Systems Approach for Convergence of the Bayesian EM
Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based.
The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM)
We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.