An Abstraction-based Method to Verify Multi-Agent Deep
Reinforcement-Learning Behaviours
- URL: http://arxiv.org/abs/2102.01434v1
- Date: Tue, 2 Feb 2021 11:12:30 GMT
- Title: An Abstraction-based Method to Verify Multi-Agent Deep
Reinforcement-Learning Behaviours
- Authors: Pierre El Mqirmi, Francesco Belardinelli and Borja G. Le\'on
- Abstract summary: Multi-agent reinforcement learning (RL) often struggles to ensure the safe behaviours of the learning agents.
We present a methodology that combines formal verification with (deep) RL algorithms to guarantee the satisfaction of formally-specified safety constraints.
- Score: 8.95294551927446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-agent reinforcement learning (RL) often struggles to ensure the safe
behaviours of the learning agents, and therefore it is generally not adapted to
safety-critical applications. To address this issue, we present a methodology
that combines formal verification with (deep) RL algorithms to guarantee the
satisfaction of formally-specified safety constraints both in training and
testing. The approach we propose expresses the constraints to verify in
Probabilistic Computation Tree Logic (PCTL) and builds an abstract
representation of the system to reduce the complexity of the verification step.
This abstract model allows for model checking techniques to identify a set of
abstract policies that meet the safety constraints expressed in PCTL. Then, the
agents' behaviours are restricted according to these safe abstract policies. We
provide formal guarantees that by using this method, the actions of the agents
always meet the safety constraints, and provide a procedure to generate an
abstract model automatically. We empirically evaluate and show the
effectiveness of our method in a multi-agent environment.
Related papers
- Safeguarded Progress in Reinforcement Learning: Safe Bayesian
Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL)
We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z) - Online Safety Property Collection and Refinement for Safe Deep
Reinforcement Learning in Mapless Navigation [79.89605349842569]
We introduce the Collection and Refinement of Online Properties (CROP) framework to design properties at training time.
CROP employs a cost signal to identify unsafe interactions and use them to shape safety properties.
We evaluate our approach in several robotic mapless navigation tasks and demonstrate that the violation metric computed with CROP allows higher returns and lower violations over previous Safe DRL approaches.
arXiv Detail & Related papers (2023-02-13T21:19:36Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Risk Consistent Multi-Class Learning from Label Proportions [64.0125322353281]
This study addresses a multiclass learning from label proportions (MCLLP) setting in which training instances are provided in bags.
Most existing MCLLP methods impose bag-wise constraints on the prediction of instances or assign them pseudo-labels.
A risk-consistent method is proposed for instance classification using the empirical risk minimization framework.
arXiv Detail & Related papers (2022-03-24T03:49:04Z) - Verified Probabilistic Policies for Deep Reinforcement Learning [6.85316573653194]
We tackle the problem of verifying probabilistic policies for deep reinforcement learning.
We propose an abstraction approach, based on interval Markov decision processes, that yields guarantees on a policy's execution.
We present techniques to build and solve these models using abstract interpretation, mixed-integer linear programming, entropy-based refinement and probabilistic model checking.
arXiv Detail & Related papers (2022-01-10T23:55:04Z) - Reinforcement Learning for Task Specifications with Action-Constraints [4.046919218061427]
We propose a method to learn optimal control policies for a finite-state Markov Decision Process.
We assume that the set of action sequences that are deemed unsafe and/or safe are given in terms of a finite-state automaton.
We present a version of the Q-learning algorithm for learning optimal policies in the presence of non-Markovian action-sequence and state constraints.
arXiv Detail & Related papers (2022-01-02T04:22:01Z) - Distillation of RL Policies with Formal Guarantees via Variational
Abstraction of Markov Decision Processes (Technical Report) [0.0]
We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL)
We derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it.
We show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees.
arXiv Detail & Related papers (2021-12-17T17:57:32Z) - Evaluating the Safety of Deep Reinforcement Learning Models using
Semi-Formal Verification [81.32981236437395]
We present a semi-formal verification approach for decision-making tasks based on interval analysis.
Our method obtains comparable results over standard benchmarks with respect to formal verifiers.
Our approach allows to efficiently evaluate safety properties for decision-making models in practical applications.
arXiv Detail & Related papers (2020-10-19T11:18:06Z) - Constrained Markov Decision Processes via Backward Value Functions [43.649330976089004]
We model the problem of learning with constraints as a Constrained Markov Decision Process.
A key contribution of our approach is to translate cumulative cost constraints into state-based constraints.
We provide theoretical guarantees under which the agent converges while ensuring safety over the course of training.
arXiv Detail & Related papers (2020-08-26T20:56:16Z) - Learning with Safety Constraints: Sample Complexity of Reinforcement
Learning for Constrained MDPs [13.922754427601491]
We characterize the relationship between safety constraints and the number of samples needed to ensure a desired level of accuracy.
Our main finding is that compared to the best known bounds of the unconstrained regime, the sample of constrained RL algorithms are increased by a factor that is logarithmic in the number of constraints.
arXiv Detail & Related papers (2020-08-01T18:17:08Z) - SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics.
We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations.
We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.