Related papers: Think Smart, Act SMARL! Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement Learning

Think Smart, Act SMARL! Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2411.04867v2
Date: Wed, 14 May 2025 13:30:31 GMT
Title: Think Smart, Act SMARL! Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement Learning
Authors: Satchit Chatterji, Erman Acar,
Abstract summary: We propose Shielded Multi-Agent Reinforcement Learning (SMARL) as a framework for steering norm-compliant outcomes.<n>Key contributions are: (1) a novel Probabilistic Logic Temporal Difference (PLTD) update for shielded, independent Q-learning; (2) a probabilistic logic policy gradient method for shielded PPO with formal safety guarantees for MARL; and (3) comprehensive evaluation across symmetric and asymmetrically shielded $n$-player game-theoretic benchmarks.<n>These results position SMARL as an effective mechanism for equilibrium selection, paving the way toward safer, socially aligned multi-agent systems.
Score: 3.0846824529023382
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Safe reinforcement learning (RL) is crucial for real-world applications, and multi-agent interactions introduce additional safety challenges. While Probabilistic Logic Shields (PLS) has been a powerful proposal to enforce safety in single-agent RL, their generalizability to multi-agent settings remains unexplored. In this paper, we address this gap by conducting extensive analyses of PLS within decentralized, multi-agent environments, and in doing so, propose Shielded Multi-Agent Reinforcement Learning (SMARL) as a general framework for steering MARL towards norm-compliant outcomes. Our key contributions are: (1) a novel Probabilistic Logic Temporal Difference (PLTD) update for shielded, independent Q-learning, which incorporates probabilistic constraints directly into the value update process; (2) a probabilistic logic policy gradient method for shielded PPO with formal safety guarantees for MARL; and (3) comprehensive evaluation across symmetric and asymmetrically shielded $n$-player game-theoretic benchmarks, demonstrating fewer constraint violations and significantly better cooperation under normative constraints. These results position SMARL as an effective mechanism for equilibrium selection, paving the way toward safer, socially aligned multi-agent systems.

Related papers

Probabilistic Shielding for Safe Reinforcement Learning [51.35559820893218]
In real-life scenarios, a Reinforcement Learning (RL) agent must often also behave in a safe manner, including at training time. We present a new, scalable method, which enjoys strict formal guarantees for Safe RL. We show that our approach provides a strict formal safety guarantee that the agent stays safe at training and test time.
arXiv Detail & Related papers (2025-03-09T17:54:33Z)
Scalable Safe Multi-Agent Reinforcement Learning for Multi-Agent System [1.0124625066746598]
Existing Multi-Agent Reinforcement Learning (MARL) algorithms that rely solely on reward shaping are ineffective in ensuring safety.<n>We propose a novel framework, Scalable Safe MARL (SS-MARL), to enhance the safety and scalability of MARL methods.<n>We show that SS-MARL achieves a better trade-off between optimality and safety compared to baselines, and its scalability significantly outperforms the latest methods in scenarios with a large number of agents.
arXiv Detail & Related papers (2025-01-23T15:01:19Z)
Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium [6.169364905804677]
Multi-agent reinforcement learning (MARL) has achieved notable success in cooperative tasks. deploying MARL agents in real-world applications presents critical safety challenges. We propose a novel theoretical framework for safe MARL with $textitstate-wise$ constraints, where safety requirements are enforced at every state the agents visit. For practical deployment in complex high-dimensional systems, we propose $textitMulti-Agent Dual Actor-Critic$ (MADAC)
arXiv Detail & Related papers (2024-11-22T16:08:42Z)
DeepSafeMPC: Deep Learning-Based Model Predictive Control for Safe Multi-Agent Reinforcement Learning [11.407941376728258]
We propose a novel method called Deep Learning-Based Model Predictive Control for Safe Multi-Agent Reinforcement Learning (DeepSafeMPC) The key insight of DeepSafeMPC is leveraging a entralized deep learning model to well predict environmental dynamics. We demonstrate the effectiveness of our approach using the Safe Multi-agent MuJoCo environment.
arXiv Detail & Related papers (2024-03-11T03:17:33Z)
Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states. In safety-critical domains, such behaviors could lead to disastrous outcomes. We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z)
Approximate Model-Based Shielding for Safe Reinforcement Learning [83.55437924143615]
We propose a principled look-ahead shielding algorithm for verifying the performance of learned RL policies. Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system. We demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
arXiv Detail & Related papers (2023-07-27T15:19:45Z)
Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning [7.103977648997475]
Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases. Model-based Dynamic Shielding (MBDS) to support MARL algorithm design.
arXiv Detail & Related papers (2023-04-13T06:08:10Z)
Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent. Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control. The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z)
Online Shielding for Reinforcement Learning [59.86192283565134]
We propose an approach for online safety shielding of RL agents. During runtime, the shield analyses the safety of each available action. Based on this probability and a given threshold, the shield decides whether to block an action from the agent.
arXiv Detail & Related papers (2022-12-04T16:00:29Z)
Safe Reinforcement Learning via Shielding for POMDPs [29.058332307331785]
Reinforcement learning (RL) in safety-critical environments requires an agent to avoid decisions with catastrophic consequences. We propose and thoroughly evaluate a tight integration of formally-verified shields for POMDPs with state-of-the-art deep RL algorithms. We empirically demonstrate that an RL agent using a shield, beyond being safe, converges to higher values of expected reward.
arXiv Detail & Related papers (2022-04-02T03:51:55Z)
Robust Policy Learning over Multiple Uncertainty Sets [91.67120465453179]
Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments. We develop an algorithm that enjoys the benefits of both system identification and robust RL.
arXiv Detail & Related papers (2022-02-14T20:06:28Z)
Multi-Agent Constrained Policy Optimisation [17.772811770726296]
We formulate the safe MARL problem as a constrained Markov game and solve it with policy optimisation methods. Our solutions -- Multi-Agent Constrained Policy optimisation (MACPO) and MAPPO-Lagrangian -- leverage the theories from both constrained policy optimisation and multi-agent trust region learning. We develop the benchmark suite of Safe Multi-Agent MuJoCo that involves a variety of MARL baselines.
arXiv Detail & Related papers (2021-10-06T14:17:09Z)
Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards. We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences. We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z)
ERMAS: Becoming Robust to Reward Function Sim-to-Real Gaps in Multi-Agent Simulations [110.72725220033983]
Epsilon-Robust Multi-Agent Simulation (ERMAS) is a framework for learning AI policies that are robust to such multiagent sim-to-real gaps. ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complextemporal simulations. In particular, ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complextemporal simulations.
arXiv Detail & Related papers (2021-06-10T04:32:20Z)
Assured Learning-enabled Autonomy: A Metacognitive Reinforcement Learning Framework [4.427447378048202]
Reinforcement learning (RL) agents with pre-specified reward functions cannot provide guaranteed safety across variety of circumstances. An assured autonomous control framework is presented in this paper by empowering RL algorithms with metacognitive learning capabilities.
arXiv Detail & Related papers (2021-03-23T14:01:35Z)
Multi-Agent Reinforcement Learning with Temporal Logic Specifications [65.79056365594654]
We study the problem of learning to satisfy temporal logic specifications with a group of agents in an unknown environment. We develop the first multi-agent reinforcement learning technique for temporal logic specifications. We provide correctness and convergence guarantees for our main algorithm.
arXiv Detail & Related papers (2021-02-01T01:13:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.