Related papers: Think Smart, Act SMARL! Analyzing Probabilistic Logic Driven Safety in Multi-Agent Reinforcement Learning

Think Smart, Act SMARL! Analyzing Probabilistic Logic Driven Safety in Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2411.04867v1
Date: Thu, 07 Nov 2024 16:59:32 GMT
Title: Think Smart, Act SMARL! Analyzing Probabilistic Logic Driven Safety in Multi-Agent Reinforcement Learning
Authors: Satchit Chatterji, Erman Acar,
Abstract summary: This paper introduces Shielded MARL (SMARL) to enable shielded independent Q-learning. It also introduces Probabilistic Logic Temporal Difference Learning (PLTD) to enable shielded independent Q-learning. $ii$ show its positive effect and use as an equilibrium selection mechanism in various game-theoretic environments.
Score: 3.0846824529023382
License: http://creativecommons.org/licenses/by/4.0/
Abstract: An important challenge for enabling the deployment of reinforcement learning (RL) algorithms in the real world is safety. This has resulted in the recent research field of Safe RL, which aims to learn optimal policies that are safe. One successful approach in that direction is probabilistic logic shields (PLS), a model-based Safe RL technique that uses formal specifications based on probabilistic logic programming, constraining an agent's policy to comply with those specifications in a probabilistic sense. However, safety is inherently a multi-agent concept, since real-world environments often involve multiple agents interacting simultaneously, leading to a complex system which is hard to control. Moreover, safe multi-agent RL (Safe MARL) is still underexplored. In order to address this gap, in this paper we ($i$) introduce Shielded MARL (SMARL) by extending PLS to MARL -- in particular, we introduce Probabilistic Logic Temporal Difference Learning (PLTD) to enable shielded independent Q-learning (SIQL), and introduce shielded independent PPO (SIPPO) using probabilistic logic policy gradients; ($ii$) show its positive effect and use as an equilibrium selection mechanism in various game-theoretic environments including two-player simultaneous games, extensive-form games, stochastic games, and some grid-world extensions in terms of safety, cooperation, and alignment with normative behaviors; and ($iii$) look into the asymmetric case where only one agent is shielded, and show that the shielded agent has a significant influence on the unshielded one, providing further evidence of SMARL's ability to enhance safety and cooperation in diverse multi-agent environments.

Related papers

Probabilistic Shielding for Safe Reinforcement Learning [51.35559820893218]
In real-life scenarios, a Reinforcement Learning (RL) agent must often also behave in a safe manner, including at training time. We present a new, scalable method, which enjoys strict formal guarantees for Safe RL. We show that our approach provides a strict formal safety guarantee that the agent stays safe at training and test time.
arXiv Detail & Related papers (2025-03-09T17:54:33Z)
Scalable Safe Multi-Agent Reinforcement Learning for Multi-Agent System [1.0124625066746598]
Existing Multi-Agent Reinforcement Learning (MARL) algorithms that rely solely on reward shaping are ineffective in ensuring safety.<n>We propose a novel framework, Scalable Safe MARL (SS-MARL), to enhance the safety and scalability of MARL methods.<n>We show that SS-MARL achieves a better trade-off between optimality and safety compared to baselines, and its scalability significantly outperforms the latest methods in scenarios with a large number of agents.
arXiv Detail & Related papers (2025-01-23T15:01:19Z)
Safe Multi-Agent Reinforcement Learning with Convergence to Generalized Nash Equilibrium [6.169364905804677]
Multi-agent reinforcement learning (MARL) has achieved notable success in cooperative tasks. deploying MARL agents in real-world applications presents critical safety challenges. We propose a novel theoretical framework for safe MARL with $textitstate-wise$ constraints, where safety requirements are enforced at every state the agents visit. For practical deployment in complex high-dimensional systems, we propose $textitMulti-Agent Dual Actor-Critic$ (MADAC)
arXiv Detail & Related papers (2024-11-22T16:08:42Z)
DeepSafeMPC: Deep Learning-Based Model Predictive Control for Safe Multi-Agent Reinforcement Learning [11.407941376728258]
We propose a novel method called Deep Learning-Based Model Predictive Control for Safe Multi-Agent Reinforcement Learning (DeepSafeMPC) The key insight of DeepSafeMPC is leveraging a entralized deep learning model to well predict environmental dynamics. We demonstrate the effectiveness of our approach using the Safe Multi-agent MuJoCo environment.
arXiv Detail & Related papers (2024-03-11T03:17:33Z)
Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states. In safety-critical domains, such behaviors could lead to disastrous outcomes. We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z)
Approximate Model-Based Shielding for Safe Reinforcement Learning [83.55437924143615]
We propose a principled look-ahead shielding algorithm for verifying the performance of learned RL policies. Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system. We demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
arXiv Detail & Related papers (2023-07-27T15:19:45Z)
Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning [7.103977648997475]
Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases. Model-based Dynamic Shielding (MBDS) to support MARL algorithm design.
arXiv Detail & Related papers (2023-04-13T06:08:10Z)
Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent. Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control. The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z)
Online Shielding for Reinforcement Learning [59.86192283565134]
We propose an approach for online safety shielding of RL agents. During runtime, the shield analyses the safety of each available action. Based on this probability and a given threshold, the shield decides whether to block an action from the agent.
arXiv Detail & Related papers (2022-12-04T16:00:29Z)
Safe Reinforcement Learning via Shielding for POMDPs [29.058332307331785]
Reinforcement learning (RL) in safety-critical environments requires an agent to avoid decisions with catastrophic consequences. We propose and thoroughly evaluate a tight integration of formally-verified shields for POMDPs with state-of-the-art deep RL algorithms. We empirically demonstrate that an RL agent using a shield, beyond being safe, converges to higher values of expected reward.
arXiv Detail & Related papers (2022-04-02T03:51:55Z)
Robust Policy Learning over Multiple Uncertainty Sets [91.67120465453179]
Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments. We develop an algorithm that enjoys the benefits of both system identification and robust RL.
arXiv Detail & Related papers (2022-02-14T20:06:28Z)
Multi-Agent Constrained Policy Optimisation [17.772811770726296]
We formulate the safe MARL problem as a constrained Markov game and solve it with policy optimisation methods. Our solutions -- Multi-Agent Constrained Policy optimisation (MACPO) and MAPPO-Lagrangian -- leverage the theories from both constrained policy optimisation and multi-agent trust region learning. We develop the benchmark suite of Safe Multi-Agent MuJoCo that involves a variety of MARL baselines.
arXiv Detail & Related papers (2021-10-06T14:17:09Z)
Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards. We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences. We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z)
ERMAS: Becoming Robust to Reward Function Sim-to-Real Gaps in Multi-Agent Simulations [110.72725220033983]
Epsilon-Robust Multi-Agent Simulation (ERMAS) is a framework for learning AI policies that are robust to such multiagent sim-to-real gaps. ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complextemporal simulations. In particular, ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complextemporal simulations.
arXiv Detail & Related papers (2021-06-10T04:32:20Z)
Assured Learning-enabled Autonomy: A Metacognitive Reinforcement Learning Framework [4.427447378048202]
Reinforcement learning (RL) agents with pre-specified reward functions cannot provide guaranteed safety across variety of circumstances. An assured autonomous control framework is presented in this paper by empowering RL algorithms with metacognitive learning capabilities.
arXiv Detail & Related papers (2021-03-23T14:01:35Z)
Multi-Agent Reinforcement Learning with Temporal Logic Specifications [65.79056365594654]
We study the problem of learning to satisfy temporal logic specifications with a group of agents in an unknown environment. We develop the first multi-agent reinforcement learning technique for temporal logic specifications. We provide correctness and convergence guarantees for our main algorithm.
arXiv Detail & Related papers (2021-02-01T01:13:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.