Safe Multi-Agent Reinforcement Learning with Bilevel Optimization in Autonomous Driving
- URL: http://arxiv.org/abs/2405.18209v1
- Date: Tue, 28 May 2024 14:15:18 GMT
- Title: Safe Multi-Agent Reinforcement Learning with Bilevel Optimization in Autonomous Driving
- Authors: Zhi Zheng, Shangding Gu,
- Abstract summary: We propose a safe MARL method grounded in a Stackelberg model with bi-level optimization.
We develop two practical algorithms, namely Constrained Stackelberg Q-learning (CSQ) and Constrained Stackelberg Multi-Agent Deep Deterministic Policy Gradient (CS-MADDPG)
Our algorithms, CSQ and CS-MADDPG, outperform several strong MARL baselines, such as Bi-AC, MACPO, and MAPPO-L, regarding reward and safety performance.
- Score: 3.5293763645151404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ensuring safety in MARL, particularly when deploying it in real-world applications such as autonomous driving, emerges as a critical challenge. To address this challenge, traditional safe MARL methods extend MARL approaches to incorporate safety considerations, aiming to minimize safety risk values. However, these safe MARL algorithms often fail to model other agents and lack convergence guarantees, particularly in dynamically complex environments. In this study, we propose a safe MARL method grounded in a Stackelberg model with bi-level optimization, for which convergence analysis is provided. Derived from our theoretical analysis, we develop two practical algorithms, namely Constrained Stackelberg Q-learning (CSQ) and Constrained Stackelberg Multi-Agent Deep Deterministic Policy Gradient (CS-MADDPG), designed to facilitate MARL decision-making in autonomous driving applications. To evaluate the effectiveness of our algorithms, we developed a safe MARL autonomous driving benchmark and conducted experiments on challenging autonomous driving scenarios, such as merges, roundabouts, intersections, and racetracks. The experimental results indicate that our algorithms, CSQ and CS-MADDPG, outperform several strong MARL baselines, such as Bi-AC, MACPO, and MAPPO-L, regarding reward and safety performance. The demos and source code are available at {https://github.com/SafeRL-Lab/Safe-MARL-in-Autonomous-Driving.git}.
Related papers
- Empowering Autonomous Driving with Large Language Models: A Safety Perspective [82.90376711290808]
This paper explores the integration of Large Language Models (LLMs) into Autonomous Driving systems.
LLMs are intelligent decision-makers in behavioral planning, augmented with a safety verifier shield for contextual safety learning.
We present two key studies in a simulated environment: an adaptive LLM-conditioned Model Predictive Control (MPC) and an LLM-enabled interactive behavior planning scheme with a state machine.
arXiv Detail & Related papers (2023-11-28T03:13:09Z) - Robust Multi-Agent Reinforcement Learning via Adversarial
Regularization: Theoretical Foundation and Stable Algorithms [79.61176746380718]
Multi-Agent Reinforcement Learning (MARL) has shown promising results across several domains.
MARL policies often lack robustness and are sensitive to small changes in their environment.
We show that we can gain robustness by controlling a policy's Lipschitz constant.
We propose a new robust MARL framework, ERNIE, that promotes the Lipschitz continuity of the policies.
arXiv Detail & Related papers (2023-10-16T20:14:06Z) - Robust Multi-Agent Reinforcement Learning with State Uncertainty [17.916400875478377]
We study the problem of MARL with state uncertainty in this work.
We propose a robust multi-agent Q-learning algorithm to find such an equilibrium.
Our experiments show that the proposed RMAQ algorithm converges to the optimal value function.
arXiv Detail & Related papers (2023-07-30T12:31:42Z) - Provably Learning Nash Policies in Constrained Markov Potential Games [90.87573337770293]
Multi-agent reinforcement learning (MARL) addresses sequential decision-making problems with multiple agents.
Constrained Markov Games (CMGs) are a natural formalism for safe MARL problems, though generally intractable.
arXiv Detail & Related papers (2023-06-13T13:08:31Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Towards Comprehensive Testing on the Robustness of Cooperative
Multi-agent Reinforcement Learning [10.132303690998523]
It is crucial to test the robustness of c-MARL algorithm before it was deployed in reality.
Existing adversarial attacks for MARL could be used for testing, but is limited to one robustness aspect.
We propose MARLSafe, the first robustness testing framework for c-MARL algorithms.
arXiv Detail & Related papers (2022-04-17T05:15:51Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Safety-aware Policy Optimisation for Autonomous Racing [17.10371721305536]
We introduce Hamilton-Jacobi (HJ) reachability theory into the constrained Markov decision process (CMDP) framework.
We demonstrate that the HJ safety value can be learned directly on vision context.
We evaluate our method on several benchmark tasks, including Safety Gym and Learn-to-Race (L2R), a recently-released high-fidelity autonomous racing environment.
arXiv Detail & Related papers (2021-10-14T20:15:45Z) - Multi-Agent Constrained Policy Optimisation [17.772811770726296]
We formulate the safe MARL problem as a constrained Markov game and solve it with policy optimisation methods.
Our solutions -- Multi-Agent Constrained Policy optimisation (MACPO) and MAPPO-Lagrangian -- leverage the theories from both constrained policy optimisation and multi-agent trust region learning.
We develop the benchmark suite of Safe Multi-Agent MuJoCo that involves a variety of MARL baselines.
arXiv Detail & Related papers (2021-10-06T14:17:09Z) - Decision-Making under On-Ramp merge Scenarios by Distributional Soft
Actor-Critic Algorithm [10.258474373022075]
We propose an RL-based end-to-end decision-making method under a framework of offline training and online correction, called the Shielded Distributional Soft Actor-critic (SDSAC)
The results show that the SDSAC has the best safety performance compared to baseline algorithms and efficient driving simultaneously.
arXiv Detail & Related papers (2021-03-08T03:57:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.