Probabilistic Reach-Avoid for Bayesian Neural Networks
- URL: http://arxiv.org/abs/2310.01951v1
- Date: Tue, 3 Oct 2023 10:52:21 GMT
- Title: Probabilistic Reach-Avoid for Bayesian Neural Networks
- Authors: Matthew Wicker, Luca Laurenti, Andrea Patane, Nicola Paoletti,
Alessandro Abate, Marta Kwiatkowska
- Abstract summary: We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
- Score: 71.67052234622781
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model-based reinforcement learning seeks to simultaneously learn the dynamics
of an unknown stochastic environment and synthesise an optimal policy for
acting in it. Ensuring the safety and robustness of sequential decisions made
through a policy in such an environment is a key challenge for policies
intended for safety-critical scenarios. In this work, we investigate two
complementary problems: first, computing reach-avoid probabilities for
iterative predictions made with dynamical models, with dynamics described by
Bayesian neural network (BNN); second, synthesising control policies that are
optimal with respect to a given reach-avoid specification (reaching a "target"
state, while avoiding a set of "unsafe" states) and a learned BNN model. Our
solution leverages interval propagation and backward recursion techniques to
compute lower bounds for the probability that a policy's sequence of actions
leads to satisfying the reach-avoid specification. Such computed lower bounds
provide safety certification for the given policy and BNN model. We then
introduce control synthesis algorithms to derive policies maximizing said lower
bounds on the safety probability. We demonstrate the effectiveness of our
method on a series of control benchmarks characterized by learned BNN dynamics
models. On our most challenging benchmark, compared to purely data-driven
policies the optimal synthesis algorithm is able to provide more than a
four-fold increase in the number of certifiable states and more than a
three-fold increase in the average guaranteed reach-avoid probability.
Related papers
- Natural Actor-Critic for Robust Reinforcement Learning with Function
Approximation [20.43657369407846]
We study robust reinforcement learning (RL) with the goal of determining a well-performing policy that is robust against model mismatch between the training simulator and the testing environment.
We propose two novel uncertainty set formulations, one based on double sampling and the other on an integral probability metric.
We demonstrate the robust performance of the policy learned by our proposed RNAC approach in multiple MuJoCo environments and a real-world TurtleBot navigation task.
arXiv Detail & Related papers (2023-07-17T22:10:20Z) - Provable Offline Preference-Based Reinforcement Learning [95.00042541409901]
We investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback.
We consider the general reward setting where the reward can be defined over the whole trajectory.
We introduce a new single-policy concentrability coefficient, which can be upper bounded by the per-trajectory concentrability.
arXiv Detail & Related papers (2023-05-24T07:11:26Z) - Model-based Safe Deep Reinforcement Learning via a Constrained Proximal
Policy Optimization Algorithm [4.128216503196621]
We propose an On-policy Model-based Safe Deep RL algorithm in which we learn the transition dynamics of the environment in an online manner.
We show that our algorithm is more sample efficient and results in lower cumulative hazard violations as compared to constrained model-free approaches.
arXiv Detail & Related papers (2022-10-14T06:53:02Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Verified Probabilistic Policies for Deep Reinforcement Learning [6.85316573653194]
We tackle the problem of verifying probabilistic policies for deep reinforcement learning.
We propose an abstraction approach, based on interval Markov decision processes, that yields guarantees on a policy's execution.
We present techniques to build and solve these models using abstract interpretation, mixed-integer linear programming, entropy-based refinement and probabilistic model checking.
arXiv Detail & Related papers (2022-01-10T23:55:04Z) - Certification of Iterative Predictions in Bayesian Neural Networks [79.15007746660211]
We compute lower bounds for the probability that trajectories of the BNN model reach a given set of states while avoiding a set of unsafe states.
We use the lower bounds in the context of control and reinforcement learning to provide safety certification for given control policies.
arXiv Detail & Related papers (2021-05-21T05:23:57Z) - Safe Continuous Control with Constrained Model-Based Policy Optimization [0.0]
We introduce a model-based safe exploration algorithm for constrained high-dimensional control.
We also introduce a practical algorithm that accelerates policy search with model-generated data.
arXiv Detail & Related papers (2021-04-14T15:20:55Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.