Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning
- URL: http://arxiv.org/abs/2306.17052v2
- Date: Thu, 28 Dec 2023 02:40:37 GMT
- Title: Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning
- Authors: Matej Jusup, Barna P\'asztor, Tadeusz Janik, Kenan Zhang, Francesco
Corman, Andreas Krause and Ilija Bogunovic
- Abstract summary: Mean-field reinforcement learning addresses the policy of a representative agent interacting with the infinite population of identical agents.
We propose Safe-M$3$-UCRL, the first model-based mean-field reinforcement learning algorithm that attains safe policies even in the case of unknown transitions.
Our algorithm effectively meets the demand in critical areas while ensuring service accessibility in regions with low demand.
- Score: 48.667697255912614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many applications, e.g., in shared mobility, require coordinating a large
number of agents. Mean-field reinforcement learning addresses the resulting
scalability challenge by optimizing the policy of a representative agent
interacting with the infinite population of identical agents instead of
considering individual pairwise interactions. In this paper, we address an
important generalization where there exist global constraints on the
distribution of agents (e.g., requiring capacity constraints or minimum
coverage requirements to be met). We propose Safe-M$^3$-UCRL, the first
model-based mean-field reinforcement learning algorithm that attains safe
policies even in the case of unknown transitions. As a key ingredient, it uses
epistemic uncertainty in the transition model within a log-barrier approach to
ensure pessimistic constraints satisfaction with high probability. Beyond the
synthetic swarm motion benchmark, we showcase Safe-M$^3$-UCRL on the vehicle
repositioning problem faced by many shared mobility operators and evaluate its
performance through simulations built on vehicle trajectory data from a service
provider in Shenzhen. Our algorithm effectively meets the demand in critical
areas while ensuring service accessibility in regions with low demand.
Related papers
- SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Controllable Adversaries [94.84458417662407]
We introduce SAFE-SIM, a novel diffusion-based controllable closed-loop safety-critical simulation framework.
We develop a novel approach to simulate safety-critical scenarios through an adversarial term in the denoising process.
We validate our framework empirically using the NuScenes dataset, demonstrating improvements in both realism and controllability.
arXiv Detail & Related papers (2023-12-31T04:14:43Z) - DePAint: A Decentralized Safe Multi-Agent Reinforcement Learning Algorithm considering Peak and Average Constraints [1.1549572298362787]
We propose a momentum-based decentralized gradient policy method, DePAint, to solve the problem.
This is the first privacy-preserving fully decentralized multi-agent reinforcement learning algorithm that considers both peak and average constraints.
arXiv Detail & Related papers (2023-10-22T16:36:03Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning.
We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Fully Decentralized Model-based Policy Optimization for Networked
Systems [23.46407780093797]
This work aims to improve data efficiency of multi-agent control by model-based learning.
We consider networked systems where agents are cooperative and communicate only locally with their neighbors.
In our method, each agent learns a dynamic model to predict future states and broadcast their predictions by communication, and then the policies are trained under the model rollouts.
arXiv Detail & Related papers (2022-07-13T23:52:14Z) - ROMAX: Certifiably Robust Deep Multiagent Reinforcement Learning via
Convex Relaxation [32.091346776897744]
Cyber-physical attacks can challenge the robustness of multiagent reinforcement learning.
We propose a minimax MARL approach to infer the worst-case policy update of other agents.
arXiv Detail & Related papers (2021-09-14T16:18:35Z) - ERMAS: Becoming Robust to Reward Function Sim-to-Real Gaps in
Multi-Agent Simulations [110.72725220033983]
Epsilon-Robust Multi-Agent Simulation (ERMAS) is a framework for learning AI policies that are robust to such multiagent sim-to-real gaps.
ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complextemporal simulations.
In particular, ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complextemporal simulations.
arXiv Detail & Related papers (2021-06-10T04:32:20Z) - Combining Propositional Logic Based Decision Diagrams with Decision
Making in Urban Systems [10.781866671930851]
We tackle the problem of multiagent pathfinding under uncertainty and partial observability.
We use propositional logic and integrate them with the RL algorithms to enable fast simulation for RL.
arXiv Detail & Related papers (2020-11-09T13:13:54Z) - A Deep Multi-Agent Reinforcement Learning Approach to Autonomous
Separation Assurance [5.196149362684628]
A novel deep multi-agent reinforcement learning framework is proposed to identify and resolve conflicts among a variable number of aircraft.
The proposed framework is validated on three challenging case studies in the BlueSky air traffic control environment.
arXiv Detail & Related papers (2020-03-17T16:50:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.