Context-Aware Bayesian Network Actor-Critic Methods for Cooperative
Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2306.01920v1
- Date: Fri, 2 Jun 2023 21:22:27 GMT
- Title: Context-Aware Bayesian Network Actor-Critic Methods for Cooperative
Multi-Agent Reinforcement Learning
- Authors: Dingyang Chen, Qi Zhang
- Abstract summary: We introduce a Bayesian network to inaugurate correlations between agents' action selections in their joint policy.
We develop practical algorithms to learn the context-aware Bayesian network policies.
Empirical results on a range of MARL benchmarks show the benefits of our approach.
- Score: 7.784991832712813
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Executing actions in a correlated manner is a common strategy for human
coordination that often leads to better cooperation, which is also potentially
beneficial for cooperative multi-agent reinforcement learning (MARL). However,
the recent success of MARL relies heavily on the convenient paradigm of purely
decentralized execution, where there is no action correlation among agents for
scalability considerations. In this work, we introduce a Bayesian network to
inaugurate correlations between agents' action selections in their joint
policy. Theoretically, we establish a theoretical justification for why action
dependencies are beneficial by deriving the multi-agent policy gradient formula
under such a Bayesian network joint policy and proving its global convergence
to Nash equilibria under tabular softmax policy parameterization in cooperative
Markov games. Further, by equipping existing MARL algorithms with a recent
method of differentiable directed acyclic graphs (DAGs), we develop practical
algorithms to learn the context-aware Bayesian network policies in scenarios
with partial observability and various difficulty. We also dynamically decrease
the sparsity of the learned DAG throughout the training process, which leads to
weakly or even purely independent policies for decentralized execution.
Empirical results on a range of MARL benchmarks show the benefits of our
approach.
Related papers
- Mixed Q-Functionals: Advancing Value-Based Methods in Cooperative MARL
with Continuous Action Domains [0.0]
We propose a novel multi-agent value-based algorithm, Mixed Q-Functionals (MQF), inspired by the idea of Q-Functionals.
Our algorithm fosters collaboration among agents by mixing their action-values.
Our empirical findings reveal that MQF outperforms four variants of Deep Deterministic Policy Gradient.
arXiv Detail & Related papers (2024-02-12T16:21:50Z) - Mimicking Better by Matching the Approximate Action Distribution [48.95048003354255]
We introduce MAAD, a novel, sample-efficient on-policy algorithm for Imitation Learning from Observations.
We show that it requires considerable fewer interactions to achieve expert performance, outperforming current state-of-the-art on-policy methods.
arXiv Detail & Related papers (2023-06-16T12:43:47Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - Toward Risk-based Optimistic Exploration for Cooperative Multi-Agent
Reinforcement Learning [9.290757451344673]
We present a risk-based exploration that leads to collaboratively optimistic behavior by shifting the sampling region of distribution.
Our method shows remarkable performance in multi-agent settings requiring cooperative exploration based on quantile regression.
arXiv Detail & Related papers (2023-03-03T08:17:57Z) - Towards Global Optimality in Cooperative MARL with the Transformation
And Distillation Framework [26.612749327414335]
Decentralized execution is one core demand in cooperative multi-agent reinforcement learning (MARL)
In this paper, we theoretically analyze two common classes of algorithms with decentralized policies -- multi-agent policy gradient methods and value-decomposition methods.
We show that TAD-PPO can theoretically perform optimal policy learning in the finite multi-agent MDPs and shows significant outperformance on a large set of cooperative multi-agent tasks.
arXiv Detail & Related papers (2022-07-12T06:59:13Z) - Revisiting Some Common Practices in Cooperative Multi-Agent
Reinforcement Learning [11.91425153754564]
We show that in environments with a highly multi-modal reward landscape, value decomposition, and parameter sharing can be problematic and lead to undesired outcomes.
In contrast, policy gradient (PG) methods with individual policies provably converge to an optimal solution in these cases.
We present practical suggestions on implementing multi-agent PG algorithms for either high rewards or diverse emergent behaviors.
arXiv Detail & Related papers (2022-06-15T13:03:05Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - A Deep Reinforcement Learning Approach to Marginalized Importance
Sampling with the Successor Representation [61.740187363451746]
Marginalized importance sampling (MIS) measures the density ratio between the state-action occupancy of a target policy and that of a sampling distribution.
We bridge the gap between MIS and deep reinforcement learning by observing that the density ratio can be computed from the successor representation of the target policy.
We evaluate the empirical performance of our approach on a variety of challenging Atari and MuJoCo environments.
arXiv Detail & Related papers (2021-06-12T20:21:38Z) - Modeling the Interaction between Agents in Cooperative Multi-Agent
Reinforcement Learning [2.9360071145551068]
We propose a novel cooperative MARL algorithm named as interactive actor-critic(IAC)
IAC models the interaction of agents from perspectives of policy and value function.
We extend the value decomposition methods to continuous control tasks and evaluate IAC on benchmark tasks including classic control and multi-agent particle environments.
arXiv Detail & Related papers (2021-02-10T01:58:28Z) - Cooperative Policy Learning with Pre-trained Heterogeneous Observation
Representations [51.8796674904734]
We propose a new cooperative learning framework with pre-trained heterogeneous observation representations.
We employ an encoder-decoder based graph attention to learn the intricate interactions and heterogeneous representations.
arXiv Detail & Related papers (2020-12-24T04:52:29Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.