Safe POMDP Online Planning via Shielding
- URL: http://arxiv.org/abs/2309.10216v2
- Date: Sat, 2 Mar 2024 15:49:21 GMT
- Title: Safe POMDP Online Planning via Shielding
- Authors: Shili Sheng, David Parker and Lu Feng
- Abstract summary: Partially observable Markov decision processes (POMDPs) have been widely used in many robotic applications for sequential decision-making under uncertainty.
POMDP online planning algorithms such as Partially Observable Monte-Carlo Planning (POMCP) can solve very large POMDPs with the goal of maximizing the expected return.
But the resulting policies cannot provide safety guarantees which are imperative for real-world safety-critical tasks.
- Score: 6.234405592444883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Partially observable Markov decision processes (POMDPs) have been widely used
in many robotic applications for sequential decision-making under uncertainty.
POMDP online planning algorithms such as Partially Observable Monte-Carlo
Planning (POMCP) can solve very large POMDPs with the goal of maximizing the
expected return. But the resulting policies cannot provide safety guarantees
which are imperative for real-world safety-critical tasks (e.g., autonomous
driving). In this work, we consider safety requirements represented as
almost-sure reach-avoid specifications (i.e., the probability to reach a set of
goal states is one and the probability to reach a set of unsafe states is
zero). We compute shields that restrict unsafe actions which would violate the
almost-sure reach-avoid specifications. We then integrate these shields into
the POMCP algorithm for safe POMDP online planning. We propose four distinct
shielding methods, differing in how the shields are computed and integrated,
including factored variants designed to improve scalability. Experimental
results on a set of benchmark domains demonstrate that the proposed shielding
methods successfully guarantee safety (unlike the baseline POMCP without
shielding) on large POMDPs, with negligible impact on the runtime for online
planning.
Related papers
- ConstrainedZero: Chance-Constrained POMDP Planning using Learned Probabilistic Failure Surrogates and Adaptive Safety Constraints [34.9739641898452]
This work introduces the ConstrainedZero policy algorithm that solves CC-POMDPs in belief space by learning neural network approximations of the optimal value and policy.
Results show that by separating safety constraints from the objective we can achieve a target level of safety without optimizing the balance between rewards and costs.
arXiv Detail & Related papers (2024-05-01T17:17:22Z) - Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver.
We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications.
We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z) - Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - Model-based Dynamic Shielding for Safe and Efficient Multi-Agent
Reinforcement Learning [7.103977648997475]
Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases.
Model-based Dynamic Shielding (MBDS) to support MARL algorithm design.
arXiv Detail & Related papers (2023-04-13T06:08:10Z) - Online Shielding for Reinforcement Learning [59.86192283565134]
We propose an approach for online safety shielding of RL agents.
During runtime, the shield analyses the safety of each available action.
Based on this probability and a given threshold, the shield decides whether to block an action from the agent.
arXiv Detail & Related papers (2022-12-04T16:00:29Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Risk-Averse Decision Making Under Uncertainty [18.467950783426947]
A large class of decision making under uncertainty problems can be described via Markov decision processes (MDPs) or partially observable MDPs (POMDPs)
In this paper, we consider the problem of designing policies for MDPs and POMDPs with objectives and constraints in terms of dynamic coherent risk measures.
arXiv Detail & Related papers (2021-09-09T07:52:35Z) - Rule-based Shielding for Partially Observable Monte-Carlo Planning [78.05638156687343]
We propose two contributions to Partially Observable Monte-Carlo Planning (POMCP)
The first is a method for identifying unexpected actions selected by POMCP with respect to expert prior knowledge of the task.
The second is a shielding approach that prevents POMCP from selecting unexpected actions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to velocity regulation in mobile robot navigation.
arXiv Detail & Related papers (2021-04-28T14:23:38Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Safe reinforcement learning for probabilistic reachability and safety
specifications: A Lyapunov-based approach [2.741266294612776]
We propose a model-free safety specification method that learns the maximal probability of safe operation.
Our approach constructs a Lyapunov function with respect to a safe policy to restrain each policy improvement stage.
It yields a sequence of safe policies that determine the range of safe operation, called the safe set.
arXiv Detail & Related papers (2020-02-24T09:20:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.