Related papers: Compositional Shielding and Reinforcement Learning for Multi-Agent Systems

Compositional Shielding and Reinforcement Learning for Multi-Agent Systems

URL: http://arxiv.org/abs/2410.10460v1
Date: Mon, 14 Oct 2024 12:52:48 GMT
Title: Compositional Shielding and Reinforcement Learning for Multi-Agent Systems
Authors: Asger Horn Brorholt, Kim Guldstrand Larsen, Christian Schilling,
Abstract summary: Deep reinforcement learning has emerged as a powerful tool for obtaining high-performance policies. One promising paradigm to guarantee safety is a shield, which shields a policy from making unsafe actions. In this work, we propose a novel approach for multi-agent shielding.
Score: 1.124958340749622
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep reinforcement learning has emerged as a powerful tool for obtaining high-performance policies. However, the safety of these policies has been a long-standing issue. One promising paradigm to guarantee safety is a shield, which shields a policy from making unsafe actions. However, computing a shield scales exponentially in the number of state variables. This is a particular concern in multi-agent systems with many agents. In this work, we propose a novel approach for multi-agent shielding. We address scalability by computing individual shields for each agent. The challenge is that typical safety specifications are global properties, but the shields of individual agents only ensure local properties. Our key to overcome this challenge is to apply assume-guarantee reasoning. Specifically, we present a sound proof rule that decomposes a (global, complex) safety specification into (local, simple) obligations for the shields of the individual agents. Moreover, we show that applying the shields during reinforcement learning significantly improves the quality of the policies obtained for a given training budget. We demonstrate the effectiveness and scalability of our multi-agent shielding framework in two case studies, reducing the computation time from hours to seconds and achieving fast learning convergence.

Related papers

Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security [63.41350337821108]
We propose Secure Tug-of-War (SecTOW) to enhance the security of multimodal large language models (MLLMs)<n>SecTOW consists of two modules: a defender and an auxiliary attacker, both trained iteratively using reinforcement learning (GRPO)<n>We show that SecTOW significantly improves security while preserving general performance.
arXiv Detail & Related papers (2025-07-29T17:39:48Z)
Efficient Dynamic Shielding for Parametric Safety Specifications [2.1829548755022423]
Shielding is a runtime safety enforcement tool that needs to monitor and intervene the AI controller's actions if safety could be compromised otherwise.<n>We introduce dynamic shields for parametric safety specifications, which are succinctly represented sets of all possible safety specifications that may be encountered at runtime.<n>In our experiments, the dynamic shields took a few minutes for their offline design, and took between a fraction of a second and a few seconds for online adaptation at each step, whereas the brute-force online recomputation approach was up to 5 times slower.
arXiv Detail & Related papers (2025-05-28T08:30:03Z)
Why Not Act on What You Know? Unleashing Safety Potential of LLMs via Self-Aware Guard Enhancement [48.50995874445193]
Large Language Models (LLMs) have shown impressive capabilities across various tasks but remain vulnerable to meticulously crafted jailbreak attacks.<n>We propose SAGE (Self-Aware Guard Enhancement), a training-free defense strategy designed to align LLMs' strong safety discrimination performance with their relatively weaker safety generation ability.
arXiv Detail & Related papers (2025-05-17T15:54:52Z)
Probabilistic Shielding for Safe Reinforcement Learning [51.35559820893218]
In real-life scenarios, a Reinforcement Learning (RL) agent must often also behave in a safe manner, including at training time. We present a new, scalable method, which enjoys strict formal guarantees for Safe RL. We show that our approach provides a strict formal safety guarantee that the agent stays safe at training and test time.
arXiv Detail & Related papers (2025-03-09T17:54:33Z)
SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance [48.80398992974831]
SafeAligner is a methodology implemented at the decoding stage to fortify defenses against jailbreak attacks. We develop two specialized models: the Sentinel Model, which is trained to foster safety, and the Intruder Model, designed to generate riskier responses. We show that SafeAligner can increase the likelihood of beneficial tokens, while reducing the occurrence of harmful ones.
arXiv Detail & Related papers (2024-06-26T07:15:44Z)
Verification-Guided Shielding for Deep Reinforcement Learning [4.418183967223081]
Deep Reinforcement Learning (DRL) has emerged as an effective approach to solving real-world tasks. Various methods have been put forth to address this issue by providing formal safety guarantees. We present verification-guided shielding -- a novel approach that bridges the DRL reliability gap by integrating these two methods.
arXiv Detail & Related papers (2024-06-10T17:44:59Z)
Approximate Model-Based Shielding for Safe Reinforcement Learning [83.55437924143615]
We propose a principled look-ahead shielding algorithm for verifying the performance of learned RL policies. Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system. We demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
arXiv Detail & Related papers (2023-07-27T15:19:45Z)
Safety Shielding under Delayed Observation [59.86192283565134]
Shields are correct-by-construction enforcers that guarantee safe execution. Shields should pick the safe corrective actions in such a way that future interferences are most likely minimized. We present the first integration of shields in a realistic driving simulator.
arXiv Detail & Related papers (2023-07-05T10:06:10Z)
Approximate Shielding of Atari Agents for Safe Exploration [83.55437924143615]
We propose a principled algorithm for safe exploration based on the concept of shielding. We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations.
arXiv Detail & Related papers (2023-04-21T16:19:54Z)
Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning [7.103977648997475]
Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases. Model-based Dynamic Shielding (MBDS) to support MARL algorithm design.
arXiv Detail & Related papers (2023-04-13T06:08:10Z)
Online Shielding for Reinforcement Learning [59.86192283565134]
We propose an approach for online safety shielding of RL agents. During runtime, the shield analyses the safety of each available action. Based on this probability and a given threshold, the shield decides whether to block an action from the agent.
arXiv Detail & Related papers (2022-12-04T16:00:29Z)
Near-Optimal Multi-Agent Learning for Safe Coverage Control [76.99020416197631]
In multi-agent coverage control problems, agents navigate their environment to reach locations that maximize the coverage of some density. In this paper, we aim to efficiently learn the density to approximately solve the coverage problem while preserving the agents' safety. We give first of its kind results: near optimal coverage in finite time while provably guaranteeing safety.
arXiv Detail & Related papers (2022-10-12T16:33:34Z)
Safe Reinforcement Learning via Shielding for POMDPs [29.058332307331785]
Reinforcement learning (RL) in safety-critical environments requires an agent to avoid decisions with catastrophic consequences. We propose and thoroughly evaluate a tight integration of formally-verified shields for POMDPs with state-of-the-art deep RL algorithms. We empirically demonstrate that an RL agent using a shield, beyond being safe, converges to higher values of expected reward.
arXiv Detail & Related papers (2022-04-02T03:51:55Z)
Safe Multi-Agent Reinforcement Learning via Shielding [29.49529835154155]
Multi-agent reinforcement learning (MARL) has been increasingly used in a wide range of safety-critical applications. Current MARL methods do not have safety guarantees. We present two shielding approaches for safe MARL.
arXiv Detail & Related papers (2021-01-27T04:27:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.