Continuous-time q-learning for mean-field control problems
- URL: http://arxiv.org/abs/2306.16208v4
- Date: Fri, 01 Nov 2024 09:07:41 GMT
- Title: Continuous-time q-learning for mean-field control problems
- Authors: Xiaoli Wei, Xiang Yu,
- Abstract summary: We study the q-learning, recently coined as the continuous time counterpart of Q-learning by Jia and Zhou (2023), for continuous time Mckean-Vlasov control problems.
We show that two q-functions are related via an integral representation under all test policies.
Based on the weak martingale condition and our proposed searching method of test policies, some model-free learning algorithms are devised.
- Score: 4.3715546759412325
- License:
- Abstract: This paper studies the q-learning, recently coined as the continuous time counterpart of Q-learning by Jia and Zhou (2023), for continuous time Mckean-Vlasov control problems in the setting of entropy-regularized reinforcement learning. In contrast to the single agent's control problem in Jia and Zhou (2023), the mean-field interaction of agents renders the definition of the q-function more subtle, for which we reveal that two distinct q-functions naturally arise: (i) the integrated q-function (denoted by $q$) as the first-order approximation of the integrated Q-function introduced in Gu, Guo, Wei and Xu (2023), which can be learnt by a weak martingale condition involving test policies; and (ii) the essential q-function (denoted by $q_e$) that is employed in the policy improvement iterations. We show that two q-functions are related via an integral representation under all test policies. Based on the weak martingale condition and our proposed searching method of test policies, some model-free learning algorithms are devised. In two examples, one in LQ control framework and one beyond LQ control framework, we can obtain the exact parameterization of the optimal value function and q-functions and illustrate our algorithms with simulation experiments.
Related papers
- Unified continuous-time q-learning for mean-field game and mean-field control problems [4.416317245952636]
We introduce the integrated q-function in decoupled form (decoupled Iq-function) and establish its martingale characterization together with the value function.
We devise a unified q-learning algorithm for both mean-field game (MFG) and mean-field control (MFC) problems.
For several examples in the jump-diffusion setting, within and beyond the LQ framework, we can obtain the exact parameterization of the decoupled Iq-functions and the value functions.
arXiv Detail & Related papers (2024-07-05T14:06:59Z) - Continuous-time q-Learning for Jump-Diffusion Models under Tsallis Entropy [8.924830900790713]
This paper studies the continuous-time reinforcement learning in jump-diffusion models by featuring the q-learning (the continuous-time counterpart of Q-learning) under Tsallis entropy regularization.
We study two financial applications, namely, an optimal portfolio liquidation problem and a non-LQ control problem.
arXiv Detail & Related papers (2024-07-04T12:26:31Z) - Two-Step Q-Learning [0.0]
The paper proposes a novel off-policy two-step Q-learning algorithm, without importance sampling.
Numerical experiments demonstrate the superior performance of both the two-step Q-learning and its smooth variants.
arXiv Detail & Related papers (2024-07-02T15:39:00Z) - Unifying (Quantum) Statistical and Parametrized (Quantum) Algorithms [65.268245109828]
We take inspiration from Kearns' SQ oracle and Valiant's weak evaluation oracle.
We introduce an extensive yet intuitive framework that yields unconditional lower bounds for learning from evaluation queries.
arXiv Detail & Related papers (2023-10-26T18:23:21Z) - QADYNAMICS: Training Dynamics-Driven Synthetic QA Diagnostic for
Zero-Shot Commonsense Question Answering [48.25449258017601]
State-of-the-art approaches fine-tune language models on QA pairs constructed from CommonSense Knowledge Bases.
We propose QADYNAMICS, a training dynamics-driven framework for QA diagnostics and refinement.
arXiv Detail & Related papers (2023-10-17T14:27:34Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - Toward Unsupervised Realistic Visual Question Answering [70.67698100148414]
We study the problem of realistic VQA (RVQA), where a model has to reject unanswerable questions (UQs) and answer answerable ones (AQs)
We first point out 2 drawbacks in current RVQA research, where (1) datasets contain too many unchallenging UQs and (2) a large number of annotated UQs are required for training.
We propose a new testing dataset, RGQA, which combines AQs from an existing VQA dataset with around 29K human-annotated UQs.
This combines pseudo UQs obtained by randomly pairing images and questions, with an
arXiv Detail & Related papers (2023-03-09T06:58:29Z) - q-Learning in Continuous Time [1.4213973379473654]
We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation.
We develop a q-learning" theory around the q-function that is independent of time discretization.
We devise different actor-critic algorithms for solving underlying problems.
arXiv Detail & Related papers (2022-07-02T02:20:41Z) - Online Target Q-learning with Reverse Experience Replay: Efficiently
finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results.
We present novel methods Q-Rex and Q-RexDaRe.
We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z) - Hamilton-Jacobi Deep Q-Learning for Deterministic Continuous-Time
Systems with Lipschitz Continuous Controls [2.922007656878633]
We propose Q-learning algorithms for continuous-time deterministic optimal control problems with Lipschitz continuous controls.
A novel semi-discrete version of the HJB equation is proposed to design a Q-learning algorithm that uses data collected in discrete time without discretizing or approximating the system dynamics.
arXiv Detail & Related papers (2020-10-27T06:11:04Z) - Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning.
We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.