q-Learning in Continuous Time
- URL: http://arxiv.org/abs/2207.00713v3
- Date: Mon, 24 Apr 2023 00:18:09 GMT
- Title: q-Learning in Continuous Time
- Authors: Yanwei Jia and Xun Yu Zhou
- Abstract summary: We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation.
We develop a q-learning" theory around the q-function that is independent of time discretization.
We devise different actor-critic algorithms for solving underlying problems.
- Score: 1.4213973379473654
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the continuous-time counterpart of Q-learning for reinforcement
learning (RL) under the entropy-regularized, exploratory diffusion process
formulation introduced by Wang et al. (2020). As the conventional (big)
Q-function collapses in continuous time, we consider its first-order
approximation and coin the term ``(little) q-function". This function is
related to the instantaneous advantage rate function as well as the
Hamiltonian. We develop a ``q-learning" theory around the q-function that is
independent of time discretization. Given a stochastic policy, we jointly
characterize the associated q-function and value function by martingale
conditions of certain stochastic processes, in both on-policy and off-policy
settings. We then apply the theory to devise different actor-critic algorithms
for solving underlying RL problems, depending on whether or not the density
function of the Gibbs measure generated from the q-function can be computed
explicitly. One of our algorithms interprets the well-known Q-learning
algorithm SARSA, and another recovers a policy gradient (PG) based
continuous-time algorithm proposed in Jia and Zhou (2022b). Finally, we conduct
simulation experiments to compare the performance of our algorithms with those
of PG-based algorithms in Jia and Zhou (2022b) and time-discretized
conventional Q-learning algorithms.
Related papers
- Continuous-time q-Learning for Jump-Diffusion Models under Tsallis Entropy [8.924830900790713]
This paper studies the continuous-time reinforcement learning in jump-diffusion models by featuring the q-learning (the continuous-time counterpart of Q-learning) under Tsallis entropy regularization.
We study two financial applications, namely, an optimal portfolio liquidation problem and a non-LQ control problem.
arXiv Detail & Related papers (2024-07-04T12:26:31Z) - Stochastic Q-learning for Large Discrete Action Spaces [79.1700188160944]
In complex environments with discrete action spaces, effective decision-making is critical in reinforcement learning (RL)
We present value-based RL approaches which, as opposed to optimizing over the entire set of $n$ actions, only consider a variable set of actions, possibly as small as $mathcalO(log(n)$)$.
The presented value-based RL methods include, among others, Q-learning, StochDQN, StochDDQN, all of which integrate this approach for both value-function updates and action selection.
arXiv Detail & Related papers (2024-05-16T17:58:44Z) - Unifying (Quantum) Statistical and Parametrized (Quantum) Algorithms [65.268245109828]
We take inspiration from Kearns' SQ oracle and Valiant's weak evaluation oracle.
We introduce an extensive yet intuitive framework that yields unconditional lower bounds for learning from evaluation queries.
arXiv Detail & Related papers (2023-10-26T18:23:21Z) - Continuous-time q-learning for mean-field control problems [4.3715546759412325]
We study the q-learning, recently coined as the continuous time counterpart of Q-learning by Jia and Zhou (2023), for continuous time Mckean-Vlasov control problems.
We show that two q-functions are related via an integral representation under all test policies.
Based on the weak martingale condition and our proposed searching method of test policies, some model-free learning algorithms are devised.
arXiv Detail & Related papers (2023-06-28T13:43:46Z) - Model-Free Characterizations of the Hamilton-Jacobi-Bellman Equation and
Convex Q-Learning in Continuous Time [1.4050836886292872]
This paper explores algorithm design in the continuous time domain, with finite-horizon optimal control objective.
Main contributions are (i) Algorithm design is based on a new Q-ODE, which defines the model-free characterization of the Hamilton-Jacobi-Bellman equation.
A characterization of boundedness of the constraint region is obtained through a non-trivial extension of recent results from the discrete time setting.
arXiv Detail & Related papers (2022-10-14T21:55:57Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Hamilton-Jacobi Deep Q-Learning for Deterministic Continuous-Time
Systems with Lipschitz Continuous Controls [2.922007656878633]
We propose Q-learning algorithms for continuous-time deterministic optimal control problems with Lipschitz continuous controls.
A novel semi-discrete version of the HJB equation is proposed to design a Q-learning algorithm that uses data collected in discrete time without discretizing or approximating the system dynamics.
arXiv Detail & Related papers (2020-10-27T06:11:04Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning.
We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z) - Convex Q-Learning, Part 1: Deterministic Optimal Control [5.685589351789462]
It is well known that the extension of Watkins' algorithm to general function approximation settings is challenging.
The paper begins with a brief survey of linear programming approaches to optimal control, leading to a particular over parameterization that lends itself to applications in reinforcement learning.
It is shown that in fact the algorithms are very different: while convex Q-learning solves a convex program that approximates the Bellman equation, theory for DQN is no stronger than for Watkins' algorithm with function approximation.
arXiv Detail & Related papers (2020-08-08T17:17:42Z) - Momentum Q-learning with Finite-Sample Convergence Guarantee [49.38471009162477]
This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee.
We establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling.
We demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.
arXiv Detail & Related papers (2020-07-30T12:27:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.