Safe Deep Reinforcement Learning for Multi-Agent Systems with Continuous
Action Spaces
- URL: http://arxiv.org/abs/2108.03952v2
- Date: Wed, 11 Aug 2021 09:25:11 GMT
- Title: Safe Deep Reinforcement Learning for Multi-Agent Systems with Continuous
Action Spaces
- Authors: Ziyad Sheebaelhamd, Konstantinos Zisis, Athina Nisioti, Dimitris
Gkouletsos, Dario Pavllo, Jonas Kohler
- Abstract summary: We enhance the well-known multi-agent deep deterministic policy gradient (MADDPG) framework by adding a safety layer to the deep policy network.
We propose to circumvent infeasibility problems in the action correction step using soft constraints.
- Score: 5.553946791700077
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-agent control problems constitute an interesting area of application
for deep reinforcement learning models with continuous action spaces. Such
real-world applications, however, typically come with critical safety
constraints that must not be violated. In order to ensure safety, we enhance
the well-known multi-agent deep deterministic policy gradient (MADDPG)
framework by adding a safety layer to the deep policy network. In particular,
we extend the idea of linearizing the single-step transition dynamics, as was
done for single-agent systems in Safe DDPG (Dalal et al., 2018), to multi-agent
settings. We additionally propose to circumvent infeasibility problems in the
action correction step using soft constraints (Kerrigan & Maciejowski, 2000).
Results from the theory of exact penalty functions can be used to guarantee
constraint satisfaction of the soft constraints under mild assumptions. We
empirically find that the soft formulation achieves a dramatic decrease in
constraint violations, making safety available even during the learning
procedure.
Related papers
- Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning [26.244121960815907]
We propose a primal-based framework that orchestrates policy optimization between multi-objective learning and constraint adherence.
Our method employs a novel natural policy gradient manipulation method to optimize multiple RL objectives.
Empirically, our proposed method also outperforms prior state-of-the-art methods on challenging safe multi-objective reinforcement learning tasks.
arXiv Detail & Related papers (2024-05-26T00:42:10Z) - Multi-Constraint Safe RL with Objective Suppression for Safety-Critical Applications [73.58451824894568]
We describe the multi-constraint problem with a stronger Uniformly Constrained MDP (UCMDP) model.
We then propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
We benchmark Objective Suppression in two multi-constraint safety domains, including an autonomous driving domain.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - Leveraging Approximate Model-based Shielding for Probabilistic Safety
Guarantees in Continuous Environments [63.053364805943026]
We extend the approximate model-based shielding framework to the continuous setting.
In particular we use Safety Gym as our test-bed, allowing for a more direct comparison of AMBS with popular constrained RL algorithms.
arXiv Detail & Related papers (2024-02-01T17:55:08Z) - Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement
Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy.
Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Penalized Proximal Policy Optimization for Safe Reinforcement Learning [68.86485583981866]
We propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem.
P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective.
We show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.
arXiv Detail & Related papers (2022-05-24T06:15:51Z) - Constrained Markov Decision Processes via Backward Value Functions [43.649330976089004]
We model the problem of learning with constraints as a Constrained Markov Decision Process.
A key contribution of our approach is to translate cumulative cost constraints into state-based constraints.
We provide theoretical guarantees under which the agent converges while ensuring safety over the course of training.
arXiv Detail & Related papers (2020-08-26T20:56:16Z) - Responsive Safety in Reinforcement Learning by PID Lagrangian Methods [74.49173841304474]
Lagrangian methods exhibit oscillations and overshoot which, when applied to safe reinforcement learning, leads to constraint-violating behavior.
We propose a novel Lagrange multiplier update method that utilizes derivatives of the constraint function.
We apply our PID Lagrangian methods in deep RL, setting a new state of the art in Safety Gym, a safe RL benchmark.
arXiv Detail & Related papers (2020-07-08T08:43:14Z) - Deep Constrained Q-learning [15.582910645906145]
In many real world applications, reinforcement learning agents have to optimize multiple objectives while following certain rules or satisfying a set of constraints.
We propose Constrained Q-learning, a novel off-policy reinforcement learning framework restricting the action space directly in the Q-update to learn the optimal Q-function for the induced constrained MDP and the corresponding safe policy.
arXiv Detail & Related papers (2020-03-20T17:26:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.