Actor critic learning algorithms for mean-field control with moment
neural networks
- URL: http://arxiv.org/abs/2309.04317v1
- Date: Fri, 8 Sep 2023 13:29:57 GMT
- Title: Actor critic learning algorithms for mean-field control with moment
neural networks
- Authors: Huy\^en Pham and Xavier Warin
- Abstract summary: We develop a new policy gradient and actor-critic algorithm for solving mean-field control problems.
The learning for both the actor (policy) and critic (value function) is facilitated by a class of moment neural network functions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop a new policy gradient and actor-critic algorithm for solving
mean-field control problems within a continuous time reinforcement learning
setting. Our approach leverages a gradient-based representation of the value
function, employing parametrized randomized policies. The learning for both the
actor (policy) and critic (value function) is facilitated by a class of moment
neural network functions on the Wasserstein space of probability measures, and
the key feature is to sample directly trajectories of distributions. A central
challenge addressed in this study pertains to the computational treatment of an
operator specific to the mean-field framework. To illustrate the effectiveness
of our methods, we provide a comprehensive set of numerical results. These
encompass diverse examples, including multi-dimensional settings and nonlinear
quadratic mean-field control problems with controlled volatility.
Related papers
- Inverse Reinforcement Learning from Non-Stationary Learning Agents [11.203097744443898]
We study an inverse reinforcement learning problem that involves learning the reward function of a learning agent using trajectory data collected while this agent is learning its optimal policy.
We propose an inverse reinforcement learning method that allows us to estimate the policy parameters of the learning agent which can then be used to estimate its reward function.
arXiv Detail & Related papers (2024-10-18T03:02:44Z) - Natural Policy Gradient and Actor Critic Methods for Constrained Multi-Task Reinforcement Learning [13.908826484332282]
Multi-task reinforcement learning (RL) aims to find a single policy that effectively solves multiple tasks at the same time.
This paper presents a constrained formulation for multi-task RL where the goal is to maximize the average performance of the policy across tasks subject to bounds on the performance in each task.
arXiv Detail & Related papers (2024-05-03T19:43:30Z) - Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [94.2860766709971]
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a wireless network with statistically-identical agents.
Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies.
arXiv Detail & Related papers (2024-04-04T06:24:11Z) - Actor-Critic learning for mean-field control in continuous time [0.0]
We study policy gradient for mean-field control in continuous time in a reinforcement learning setting.
By considering randomised policies with entropy regularisation, we derive a gradient expectation representation of the value function.
In the linear-quadratic mean-field framework, we obtain an exact parametrisation of the actor and critic functions defined on the Wasserstein space.
arXiv Detail & Related papers (2023-03-13T10:49:25Z) - Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic [137.04558017227583]
Actor-critic (AC) algorithms, empowered by neural networks, have had significant empirical success in recent years.
We take a mean-field perspective on the evolution and convergence of feature-based neural AC.
We prove that neural AC finds the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2021-12-27T06:09:50Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks.
In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other.
This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z) - Policy Gradient for Continuing Tasks in Non-stationary Markov Decision
Processes [112.38662246621969]
Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities.
We compute unbiased navigation gradients of the value function which we use as ascent directions to update the policy.
A major drawback of policy gradient-type algorithms is that they are limited to episodic tasks unless stationarity assumptions are imposed.
arXiv Detail & Related papers (2020-10-16T15:15:42Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.