Fully Distributed Actor-Critic Architecture for Multitask Deep
Reinforcement Learning
- URL: http://arxiv.org/abs/2110.12306v1
- Date: Sat, 23 Oct 2021 21:57:43 GMT
- Title: Fully Distributed Actor-Critic Architecture for Multitask Deep
Reinforcement Learning
- Authors: Sergio Valcarcel Macua, Ian Davies, Aleksi Tukiainen, Enrique Munoz de
Cote
- Abstract summary: We propose a fully distributed actor-critic architecture, named Diff-DAC, with application to multitask reinforcement learning (MRL)
Agents communicate their value and policy parameters to their neighbours, diffusing the information across a network of agents with no need for a central station.
We prove almost sure convergence of Diff-DAC to a common policy under general assumptions that hold even for deep-neural network approximations.
- Score: 6.628062414583634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a fully distributed actor-critic architecture, named Diff-DAC,
with application to multitask reinforcement learning (MRL). During the learning
process, agents communicate their value and policy parameters to their
neighbours, diffusing the information across a network of agents with no need
for a central station. Each agent can only access data from its local task, but
aims to learn a common policy that performs well for the whole set of tasks.
The architecture is scalable, since the computational and communication cost
per agent depends on the number of neighbours rather than the overall number of
agents. We derive Diff-DAC from duality theory and provide novel insights into
the actor-critic framework, showing that it is actually an instance of the dual
ascent method. We prove almost sure convergence of Diff-DAC to a common policy
under general assumptions that hold even for deep-neural network
approximations. For more restrictive assumptions, we also prove that this
common policy is a stationary point of an approximation of the original
problem. Numerical results on multitask extensions of common continuous control
benchmarks demonstrate that Diff-DAC stabilises learning and has a regularising
effect that induces higher performance and better generalisation properties
than previous architectures.
Related papers
- On the Linear Speedup of Personalized Federated Reinforcement Learning with Shared Representations [15.549340968605234]
Federated reinforcement learning (FedRL) enables multiple agents to collaboratively learn a policy without sharing their local trajectories collected during agent-environment interactions.
We introduce a emphpersonalized FedRL framework (PFedRL) by taking advantage of possibly shared common structure among agents in heterogeneous environments.
arXiv Detail & Related papers (2024-11-22T15:42:43Z) - Causal Coordinated Concurrent Reinforcement Learning [8.654978787096807]
We propose a novel algorithmic framework for data sharing and coordinated exploration for the purpose of learning more data-efficient and better performing policies under a concurrent reinforcement learning setting.
Our algorithm leverages a causal inference algorithm in the form of Additive Noise Model - Mixture Model (ANM-MM) in extracting model parameters governing individual differentials via independence enforcement.
We propose a new data sharing scheme based on a similarity measure of the extracted model parameters and demonstrate superior learning speeds on a set of autoregressive, pendulum and cart-pole swing-up tasks.
arXiv Detail & Related papers (2024-01-31T17:20:28Z) - Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning [46.28771270378047]
Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories.
In this work, we consider a multi-task setting, in which each agent has its own private reward function corresponding to different tasks, while sharing the same transition kernel of the environment.
We learn a globally optimal policy that maximizes the sum of the discounted total rewards of all the agents in a decentralized manner.
arXiv Detail & Related papers (2023-11-01T00:15:18Z) - Learning From Good Trajectories in Offline Multi-Agent Reinforcement
Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets.
One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team.
We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z) - Federated Stochastic Approximation under Markov Noise and Heterogeneity: Applications in Reinforcement Learning [24.567125948995834]
Federated reinforcement learning is a framework in which $N$ agents collaboratively learn a global model.
We show that by careful collaboration of the agents in solving this joint fixed point problem, we can find the global model $N$ times faster.
arXiv Detail & Related papers (2022-06-21T08:39:12Z) - FedAvg with Fine Tuning: Local Updates Lead to Representation Learning [54.65133770989836]
Federated Averaging (FedAvg) algorithm consists of alternating between a few local gradient updates at client nodes, followed by a model averaging update at the server.
We show that the reason behind generalizability of the FedAvg's output is its power in learning the common data representation among the clients' tasks.
We also provide empirical evidence demonstrating FedAvg's representation learning ability in federated image classification with heterogeneous data.
arXiv Detail & Related papers (2022-05-27T00:55:24Z) - Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z) - AoI-Aware Resource Allocation for Platoon-Based C-V2X Networks via
Multi-Agent Multi-Task Reinforcement Learning [22.890835786710316]
This paper investigates the problem of age of information (AoI) aware radio resource management for a platooning system.
Multiple autonomous platoons exploit the cellular wireless vehicle-to-everything (C-V2X) communication technology to disseminate the cooperative awareness messages (CAMs) to their followers.
We exploit a distributed resource allocation framework based on multi-agent reinforcement learning (MARL), where each platoon leader (PL) acts as an agent and interacts with the environment to learn its optimal policy.
arXiv Detail & Related papers (2021-05-10T08:39:56Z) - Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality [131.45028999325797]
We develop a doubly robust off-policy AC (DR-Off-PAC) for discounted MDP.
DR-Off-PAC adopts a single timescale structure, in which both actor and critics are updated simultaneously with constant stepsize.
We study the finite-time convergence rate and characterize the sample complexity for DR-Off-PAC to attain an $epsilon$-accurate optimal policy.
arXiv Detail & Related papers (2021-02-23T18:56:13Z) - Dif-MAML: Decentralized Multi-Agent Meta-Learning [54.39661018886268]
We propose a cooperative multi-agent meta-learning algorithm, referred to as MAML or Dif-MAML.
We show that the proposed strategy allows a collection of agents to attain agreement at a linear rate and to converge to a stationary point of the aggregate MAML.
Simulation results illustrate the theoretical findings and the superior performance relative to the traditional non-cooperative setting.
arXiv Detail & Related papers (2020-10-06T16:51:09Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.