Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning
- URL: http://arxiv.org/abs/2306.17052v2
- Date: Thu, 28 Dec 2023 02:40:37 GMT
- Title: Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning
- Authors: Matej Jusup, Barna P\'asztor, Tadeusz Janik, Kenan Zhang, Francesco
Corman, Andreas Krause and Ilija Bogunovic
- Abstract summary: Mean-field reinforcement learning addresses the policy of a representative agent interacting with the infinite population of identical agents.
We propose Safe-M$3$-UCRL, the first model-based mean-field reinforcement learning algorithm that attains safe policies even in the case of unknown transitions.
Our algorithm effectively meets the demand in critical areas while ensuring service accessibility in regions with low demand.
- Score: 48.667697255912614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many applications, e.g., in shared mobility, require coordinating a large
number of agents. Mean-field reinforcement learning addresses the resulting
scalability challenge by optimizing the policy of a representative agent
interacting with the infinite population of identical agents instead of
considering individual pairwise interactions. In this paper, we address an
important generalization where there exist global constraints on the
distribution of agents (e.g., requiring capacity constraints or minimum
coverage requirements to be met). We propose Safe-M$^3$-UCRL, the first
model-based mean-field reinforcement learning algorithm that attains safe
policies even in the case of unknown transitions. As a key ingredient, it uses
epistemic uncertainty in the transition model within a log-barrier approach to
ensure pessimistic constraints satisfaction with high probability. Beyond the
synthetic swarm motion benchmark, we showcase Safe-M$^3$-UCRL on the vehicle
repositioning problem faced by many shared mobility operators and evaluate its
performance through simulations built on vehicle trajectory data from a service
provider in Shenzhen. Our algorithm effectively meets the demand in critical
areas while ensuring service accessibility in regions with low demand.
Related papers
- Learning Emergence of Interaction Patterns across Independent RL Agents in Multi-Agent Environments [3.0284592792243794]
Bottom Up Network (BUN) treats the collective of multi-agents as a unified entity.
Our empirical evaluations across a variety of cooperative multi-agent scenarios, including tasks such as cooperative navigation and traffic control, consistently demonstrate BUN's superiority over baseline methods with substantially reduced computational costs.
arXiv Detail & Related papers (2024-10-03T14:25:02Z) - Agent-Agnostic Centralized Training for Decentralized Multi-Agent Cooperative Driving [17.659812774579756]
We propose an asymmetric actor-critic model that learns decentralized cooperative driving policies for autonomous vehicles.
By employing attention neural networks with masking, our approach efficiently manages real-world traffic dynamics and partial observability.
arXiv Detail & Related papers (2024-03-18T16:13:02Z) - Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries [94.84458417662407]
We introduce SAFE-SIM, a controllable closed-loop safety-critical simulation framework.
Our approach yields two distinct advantages: 1) generating realistic long-tail safety-critical scenarios that closely reflect real-world conditions, and 2) providing controllable adversarial behavior for more comprehensive and interactive evaluations.
We validate our framework empirically using the nuScenes and nuPlan datasets across multiple planners, demonstrating improvements in both realism and controllability.
arXiv Detail & Related papers (2023-12-31T04:14:43Z) - DePAint: A Decentralized Safe Multi-Agent Reinforcement Learning Algorithm considering Peak and Average Constraints [1.1549572298362787]
We propose a momentum-based decentralized gradient policy method, DePAint, to solve the problem.
This is the first privacy-preserving fully decentralized multi-agent reinforcement learning algorithm that considers both peak and average constraints.
arXiv Detail & Related papers (2023-10-22T16:36:03Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - ROMAX: Certifiably Robust Deep Multiagent Reinforcement Learning via
Convex Relaxation [32.091346776897744]
Cyber-physical attacks can challenge the robustness of multiagent reinforcement learning.
We propose a minimax MARL approach to infer the worst-case policy update of other agents.
arXiv Detail & Related papers (2021-09-14T16:18:35Z) - ERMAS: Becoming Robust to Reward Function Sim-to-Real Gaps in
Multi-Agent Simulations [110.72725220033983]
Epsilon-Robust Multi-Agent Simulation (ERMAS) is a framework for learning AI policies that are robust to such multiagent sim-to-real gaps.
ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complextemporal simulations.
In particular, ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complextemporal simulations.
arXiv Detail & Related papers (2021-06-10T04:32:20Z) - Combining Propositional Logic Based Decision Diagrams with Decision
Making in Urban Systems [10.781866671930851]
We tackle the problem of multiagent pathfinding under uncertainty and partial observability.
We use propositional logic and integrate them with the RL algorithms to enable fast simulation for RL.
arXiv Detail & Related papers (2020-11-09T13:13:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.