Unified continuous-time q-learning for mean-field game and mean-field control problems
- URL: http://arxiv.org/abs/2407.04521v1
- Date: Fri, 5 Jul 2024 14:06:59 GMT
- Title: Unified continuous-time q-learning for mean-field game and mean-field control problems
- Authors: Xiaoli Wei, Xiang Yu, Fengyi Yuan,
- Abstract summary: We introduce the integrated q-function in decoupled form (decoupled Iq-function) and establish its martingale characterization together with the value function.
We devise a unified q-learning algorithm for both mean-field game (MFG) and mean-field control (MFC) problems.
For several examples in the jump-diffusion setting, within and beyond the LQ framework, we can obtain the exact parameterization of the decoupled Iq-functions and the value functions.
- Score: 4.416317245952636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the continuous-time q-learning in the mean-field jump-diffusion models from the representative agent's perspective. To overcome the challenge when the population distribution may not be directly observable, we introduce the integrated q-function in decoupled form (decoupled Iq-function) and establish its martingale characterization together with the value function, which provides a unified policy evaluation rule for both mean-field game (MFG) and mean-field control (MFC) problems. Moreover, depending on the task to solve the MFG or MFC problem, we can employ the decoupled Iq-function by different means to learn the mean-field equilibrium policy or the mean-field optimal policy respectively. As a result, we devise a unified q-learning algorithm for both MFG and MFC problems by utilizing all test policies stemming from the mean-field interactions. For several examples in the jump-diffusion setting, within and beyond the LQ framework, we can obtain the exact parameterization of the decoupled Iq-functions and the value functions, and illustrate our algorithm from the representative agent's perspective with satisfactory performance.
Related papers
- Continuous-time q-Learning for Jump-Diffusion Models under Tsallis Entropy [8.924830900790713]
This paper studies the continuous-time reinforcement learning in jump-diffusion models by featuring the q-learning (the continuous-time counterpart of Q-learning) under Tsallis entropy regularization.
We study two financial applications, namely, an optimal portfolio liquidation problem and a non-LQ control problem.
arXiv Detail & Related papers (2024-07-04T12:26:31Z) - Deep Reinforcement Learning for Infinite Horizon Mean Field Problems in Continuous Spaces [1.4999444543328293]
We present a reinforcement learning (RL) algorithm designed to solve mean field games (MFG) and mean field control (MFC) problems in a unified manner.
The proposed approach pairs the actor-critic (AC) paradigm with a representation of the mean field distribution via a parameterized score function.
A modification of the algorithm allows us to solve mixed mean field control games (MFCGs)
arXiv Detail & Related papers (2023-09-19T22:37:47Z) - Continuous-time q-learning for mean-field control problems [4.3715546759412325]
We study the q-learning, recently coined as the continuous time counterpart of Q-learning by Jia and Zhou (2023), for continuous time Mckean-Vlasov control problems.
We show that two q-functions are related via an integral representation under all test policies.
Based on the weak martingale condition and our proposed searching method of test policies, some model-free learning algorithms are devised.
arXiv Detail & Related papers (2023-06-28T13:43:46Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field
Control/Game in Continuous Time [109.06623773924737]
We study the policy gradient method for the linear-quadratic mean-field control and game.
We show that it converges to the optimal solution at a linear rate, which is verified by a synthetic simulation.
arXiv Detail & Related papers (2020-08-16T06:34:11Z) - Information Freshness-Aware Task Offloading in Air-Ground Integrated
Edge Computing Systems [49.80033982995667]
This paper studies the problem of information freshness-aware task offloading in an air-ground integrated multi-access edge computing system.
A third-party real-time application service provider provides computing services to the subscribed mobile users (MUs) with the limited communication and computation resources from the InP.
We derive a novel deep reinforcement learning (RL) scheme that adopts two separate double deep Q-networks for each MU to approximate the Q-factor and the post-decision Q-factor.
arXiv Detail & Related papers (2020-07-15T21:32:43Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z) - Unified Reinforcement Q-Learning for Mean Field Game and Control
Problems [0.0]
We present a Reinforcement Learning (RL) algorithm to solve infinite horizon Mean Field Game (MFG) and Mean Field Control (MFC) problems.
Our approach can be described as a unified two-timescale Mean Field Q-learning: The emphsame algorithm can learn either the MFG or the MFC solution by simply tuning the ratio of two learning parameters.
arXiv Detail & Related papers (2020-06-24T17:45:44Z) - Feature Quantization Improves GAN Training [126.02828112121874]
Feature Quantization (FQ) for the discriminator embeds both true and fake data samples into a shared discrete space.
Our method can be easily plugged into existing GAN models, with little computational overhead in training.
arXiv Detail & Related papers (2020-04-05T04:06:50Z) - Conditional Self-Attention for Query-based Summarization [49.616774159367516]
We propose textitconditional self-attention (CSA), a neural network module designed for conditional dependency modeling.
Experiments on Debatepedia and HotpotQA benchmark datasets show CSA consistently outperforms vanilla Transformer.
arXiv Detail & Related papers (2020-02-18T02:22:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.