Multi-Agent Constrained Policy Optimisation
- URL: http://arxiv.org/abs/2110.02793v1
- Date: Wed, 6 Oct 2021 14:17:09 GMT
- Title: Multi-Agent Constrained Policy Optimisation
- Authors: Shangding Gu, Jakub Grudzien Kuba, Munning Wen, Ruiqing Chen, Ziyan
Wang, Zheng Tian, Jun Wang, Alois Knoll, Yaodong Yang
- Abstract summary: We formulate the safe MARL problem as a constrained Markov game and solve it with policy optimisation methods.
Our solutions -- Multi-Agent Constrained Policy optimisation (MACPO) and MAPPO-Lagrangian -- leverage the theories from both constrained policy optimisation and multi-agent trust region learning.
We develop the benchmark suite of Safe Multi-Agent MuJoCo that involves a variety of MARL baselines.
- Score: 17.772811770726296
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Developing reinforcement learning algorithms that satisfy safety constraints
is becoming increasingly important in real-world applications. In multi-agent
reinforcement learning (MARL) settings, policy optimisation with safety
awareness is particularly challenging because each individual agent has to not
only meet its own safety constraints, but also consider those of others so that
their joint behaviour can be guaranteed safe. Despite its importance, the
problem of safe multi-agent learning has not been rigorously studied; very few
solutions have been proposed, nor a sharable testing environment or benchmarks.
To fill these gaps, in this work, we formulate the safe MARL problem as a
constrained Markov game and solve it with policy optimisation methods. Our
solutions -- Multi-Agent Constrained Policy Optimisation (MACPO) and
MAPPO-Lagrangian -- leverage the theories from both constrained policy
optimisation and multi-agent trust region learning. Crucially, our methods
enjoy theoretical guarantees of both monotonic improvement in reward and
satisfaction of safety constraints at every iteration. To examine the
effectiveness of our methods, we develop the benchmark suite of Safe
Multi-Agent MuJoCo that involves a variety of MARL baselines. Experimental
results justify that MACPO/MAPPO-Lagrangian can consistently satisfy safety
constraints, meanwhile achieving comparable performance to strong baselines.
Related papers
- Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium [6.169364905804677]
Multi-agent reinforcement learning (MARL) has achieved notable success in cooperative tasks.
deploying MARL agents in real-world applications presents critical safety challenges.
We propose a novel theoretical framework for safe MARL with $textitstate-wise$ constraints, where safety requirements are enforced at every state the agents visit.
For practical deployment in complex high-dimensional systems, we propose $textitMulti-Agent Dual Actor-Critic$ (MADAC)
arXiv Detail & Related papers (2024-11-22T16:08:42Z) - Multi-Agent Reinforcement Learning with Control-Theoretic Safety Guarantees for Dynamic Network Bridging [0.11249583407496219]
This work introduces a hybrid approach that integrates Multi-Agent Reinforcement Learning with control-theoretic methods to ensure safe and efficient distributed strategies.
Our contributions include a novel setpoint update algorithm that dynamically adjusts agents' positions to preserve safety conditions without compromising the mission's objectives.
arXiv Detail & Related papers (2024-04-02T01:30:41Z) - DeepSafeMPC: Deep Learning-Based Model Predictive Control for Safe
Multi-Agent Reinforcement Learning [11.407941376728258]
We propose a novel method called Deep Learning-Based Model Predictive Control for Safe Multi-Agent Reinforcement Learning (DeepSafeMPC)
The key insight of DeepSafeMPC is leveraging a entralized deep learning model to well predict environmental dynamics.
We demonstrate the effectiveness of our approach using the Safe Multi-agent MuJoCo environment.
arXiv Detail & Related papers (2024-03-11T03:17:33Z) - Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning [13.245000585002858]
In many real-world applications, a reinforcement learning (RL) agent should consider multiple objectives and adhere to safety guidelines.
We propose a constrained multi-objective gradient aggregation algorithm named Constrained Multi-Objective Gradient Aggregator (CoGAMO)
arXiv Detail & Related papers (2024-03-01T04:57:13Z) - Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - Robust Multi-Agent Reinforcement Learning via Adversarial
Regularization: Theoretical Foundation and Stable Algorithms [79.61176746380718]
Multi-Agent Reinforcement Learning (MARL) has shown promising results across several domains.
MARL policies often lack robustness and are sensitive to small changes in their environment.
We show that we can gain robustness by controlling a policy's Lipschitz constant.
We propose a new robust MARL framework, ERNIE, that promotes the Lipschitz continuity of the policies.
arXiv Detail & Related papers (2023-10-16T20:14:06Z) - Provably Learning Nash Policies in Constrained Markov Potential Games [90.87573337770293]
Multi-agent reinforcement learning (MARL) addresses sequential decision-making problems with multiple agents.
Constrained Markov Games (CMGs) are a natural formalism for safe MARL problems, though generally intractable.
arXiv Detail & Related papers (2023-06-13T13:08:31Z) - Constrained Policy Optimization via Bayesian World Models [79.0077602277004]
LAMBDA is a model-based approach for policy optimization in safety critical tasks modeled via constrained Markov decision processes.
We demonstrate LAMBDA's state of the art performance on the Safety-Gym benchmark suite in terms of sample efficiency and constraint violation.
arXiv Detail & Related papers (2022-01-24T17:02:22Z) - Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning [25.027143431992755]
Trust region methods rigorously enabled reinforcement learning (RL) agents to learn monotonically improving policies, leading to superior performance on a variety of tasks.
Unfortunately, when it comes to multi-agent reinforcement learning (MARL), the property of monotonic improvement may not simply apply.
In this paper, we extend the theory of trust region learning to MARL. Central to our findings are the multi-agent advantage decomposition lemma and the sequential policy update scheme.
Based on these, we develop Heterogeneous-Agent Trust Region Policy optimisation (HATPRO) and Heterogeneous-Agent Proximal Policy optimisation (
arXiv Detail & Related papers (2021-09-23T09:44:35Z) - Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety
Constraints in Finite MDPs [71.47895794305883]
We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning setting.
We present an SPI for this RL setting that takes into account the preferences of the algorithm's user for handling the trade-offs for different reward signals.
arXiv Detail & Related papers (2021-05-31T21:04:21Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.