Decentralized Graph-Based Multi-Agent Reinforcement Learning Using
Reward Machines
- URL: http://arxiv.org/abs/2110.00096v1
- Date: Thu, 30 Sep 2021 21:41:55 GMT
- Title: Decentralized Graph-Based Multi-Agent Reinforcement Learning Using
Reward Machines
- Authors: Jueming Hu, Zhe Xu, Weichang Wang, Guannan Qu, Yutian Pang, and
Yongming Liu
- Abstract summary: We use a reward machine to encode each agent's task and expose reward function internal structures.
We propose a decentralized graph-based reinforcement learning algorithm that equips each agent with a localized policy.
The effectiveness of the proposed DGRM algorithm is evaluated by two case studies, UAV package delivery and COVID-19 pandemic mitigation.
- Score: 5.34590273802424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In multi-agent reinforcement learning (MARL), it is challenging for a
collection of agents to learn complex temporally extended tasks. The
difficulties lie in computational complexity and how to learn the high-level
ideas behind reward functions. We study the graph-based Markov Decision Process
(MDP) where the dynamics of neighboring agents are coupled. We use a reward
machine (RM) to encode each agent's task and expose reward function internal
structures. RM has the capacity to describe high-level knowledge and encode
non-Markovian reward functions. We propose a decentralized learning algorithm
to tackle computational complexity, called decentralized graph-based
reinforcement learning using reward machines (DGRM), that equips each agent
with a localized policy, allowing agents to make decisions independently, based
on the information available to the agents. DGRM uses the actor-critic
structure, and we introduce the tabular Q-function for discrete state problems.
We show that the dependency of Q-function on other agents decreases
exponentially as the distance between them increases. Furthermore, the
complexity of DGRM is related to the local information size of the largest
$\kappa$-hop neighborhood, and DGRM can find an
$O(\rho^{\kappa+1})$-approximation of a stationary point of the objective
function. To further improve efficiency, we also propose the deep DGRM
algorithm, using deep neural networks to approximate the Q-function and policy
function to solve large-scale or continuous state problems. The effectiveness
of the proposed DGRM algorithm is evaluated by two case studies, UAV package
delivery and COVID-19 pandemic mitigation. Experimental results show that local
information is sufficient for DGRM and agents can accomplish complex tasks with
the help of RM. DGRM improves the global accumulated reward by 119% compared to
the baseline in the case of COVID-19 pandemic mitigation.
Related papers
- A Federated Online Restless Bandit Framework for Cooperative Resource Allocation [23.698976872351576]
We study the cooperative resource allocation problem with unknown system dynamics of MRPs.
We put forth a Federated Thompson-enabled Whittle Index (FedTSWI) algorithm to solve this multi-agent online RMAB problem.
Numerical results show that the proposed algorithm achieves a fast convergence rate of $mathcalO(sqrtTlog(T))$ and better performance compared with baselines.
arXiv Detail & Related papers (2024-06-12T08:34:53Z) - Risk-Aware Distributed Multi-Agent Reinforcement Learning [8.287693091673658]
We develop a distributed MARL approach to solve decision-making problems in unknown environments by learning risk-aware actions.
We then propose a distributed MARL algorithm called the CVaR QD-Learning algorithm, and establish that value functions of individual agents reaches consensus.
arXiv Detail & Related papers (2023-04-04T17:56:44Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - Hierarchies of Reward Machines [75.55324974788475]
Reward machines (RMs) are a recent formalism for representing the reward function of a reinforcement learning task through a finite-state machine.
We propose a formalism for further abstracting the subtask structure by endowing an RM with the ability to call other RMs.
arXiv Detail & Related papers (2022-05-31T12:39:24Z) - Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement
Learning Approach [130.9259586568977]
We propose novel learning algorithms to recover the dynamic Vickrey-Clarke-Grove (VCG) mechanism over multiple rounds of interaction.
A key contribution of our approach is incorporating reward-free online Reinforcement Learning (RL) to aid exploration over a rich policy space.
arXiv Detail & Related papers (2022-02-25T16:17:23Z) - On Reward-Free RL with Kernel and Neural Function Approximations:
Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function.
We tackle this problem under the context of function approximation, leveraging powerful function approximators.
We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z) - Provably Efficient Multi-Agent Reinforcement Learning with Fully
Decentralized Communication [3.5450828190071655]
Distributed exploration reduces sampling complexity in reinforcement learning.
We show that group performance can be significantly improved when each agent uses a decentralized message-passing protocol.
We show that incorporating more agents and more information sharing into the group learning scheme speeds up convergence to the optimal policy.
arXiv Detail & Related papers (2021-10-14T14:27:27Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - AoI-Aware Resource Allocation for Platoon-Based C-V2X Networks via
Multi-Agent Multi-Task Reinforcement Learning [22.890835786710316]
This paper investigates the problem of age of information (AoI) aware radio resource management for a platooning system.
Multiple autonomous platoons exploit the cellular wireless vehicle-to-everything (C-V2X) communication technology to disseminate the cooperative awareness messages (CAMs) to their followers.
We exploit a distributed resource allocation framework based on multi-agent reinforcement learning (MARL), where each platoon leader (PL) acts as an agent and interacts with the environment to learn its optimal policy.
arXiv Detail & Related papers (2021-05-10T08:39:56Z) - Dif-MAML: Decentralized Multi-Agent Meta-Learning [54.39661018886268]
We propose a cooperative multi-agent meta-learning algorithm, referred to as MAML or Dif-MAML.
We show that the proposed strategy allows a collection of agents to attain agreement at a linear rate and to converge to a stationary point of the aggregate MAML.
Simulation results illustrate the theoretical findings and the superior performance relative to the traditional non-cooperative setting.
arXiv Detail & Related papers (2020-10-06T16:51:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.