Related papers: Safe POMDP Online Planning via Shielding

Safe POMDP Online Planning via Shielding

URL: http://arxiv.org/abs/2309.10216v2
Date: Sat, 2 Mar 2024 15:49:21 GMT
Title: Safe POMDP Online Planning via Shielding
Authors: Shili Sheng, David Parker and Lu Feng
Abstract summary: Partially observable Markov decision processes (POMDPs) have been widely used in many robotic applications for sequential decision-making under uncertainty. POMDP online planning algorithms such as Partially Observable Monte-Carlo Planning (POMCP) can solve very large POMDPs with the goal of maximizing the expected return. But the resulting policies cannot provide safety guarantees which are imperative for real-world safety-critical tasks.
Score: 6.234405592444883
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Partially observable Markov decision processes (POMDPs) have been widely used in many robotic applications for sequential decision-making under uncertainty. POMDP online planning algorithms such as Partially Observable Monte-Carlo Planning (POMCP) can solve very large POMDPs with the goal of maximizing the expected return. But the resulting policies cannot provide safety guarantees which are imperative for real-world safety-critical tasks (e.g., autonomous driving). In this work, we consider safety requirements represented as almost-sure reach-avoid specifications (i.e., the probability to reach a set of goal states is one and the probability to reach a set of unsafe states is zero). We compute shields that restrict unsafe actions which would violate the almost-sure reach-avoid specifications. We then integrate these shields into the POMCP algorithm for safe POMDP online planning. We propose four distinct shielding methods, differing in how the shields are computed and integrated, including factored variants designed to improve scalability. Experimental results on a set of benchmark domains demonstrate that the proposed shielding methods successfully guarantee safety (unlike the baseline POMCP without shielding) on large POMDPs, with negligible impact on the runtime for online planning.

Related papers

An Optimistic Algorithm for online CMDPS with Anytime Adversarial Constraints [7.275101606364466]
Online safe reinforcement learning (RL) plays a key role in dynamic environments, with applications in autonomous driving, robotics, and cybersecurity.<n>The objective is to learn optimal policies that maximize rewards while satisfying safety constraints modeled by constrained Markov decision processes (CMDPs)<n>Existing methods achieve sublinear regret under constraints but often fail in adversarial settings, where constraints are unknown, time-varying, and potentially adversarially designed.<n>We propose the Optimistic Mirror Descent Primal-Dual (OMDPD) algorithm, the first to address online CMDPs with anytime adversarial constraints.
arXiv Detail & Related papers (2025-05-28T00:16:34Z)
Runtime Safety through Adaptive Shielding: From Hidden Parameter Inference to Provable Guarantees [17.670635109868854]
Variations in hidden parameters, such as a robot's mass distribution or friction, pose safety risks during execution.<n>We develop a runtime shielding mechanism for reinforcement learning.<n>We prove that the proposed mechanism satisfies probabilistic safety guarantees and yields optimal policies.
arXiv Detail & Related papers (2025-05-20T23:45:45Z)
Probabilistic Shielding for Safe Reinforcement Learning [51.35559820893218]
In real-life scenarios, a Reinforcement Learning (RL) agent must often also behave in a safe manner, including at training time. We present a new, scalable method, which enjoys strict formal guarantees for Safe RL. We show that our approach provides a strict formal safety guarantee that the agent stays safe at training and test time.
arXiv Detail & Related papers (2025-03-09T17:54:33Z)
On Almost Surely Safe Alignment of Large Language Models at Inference-Time [20.5164976103514]
We introduce a novel inference-time alignment approach for LLMs that aims to generate safe responses almost surely.<n>We augment a safety state that tracks the evolution of safety constraints and dynamically penalizes unsafe generations.<n>We demonstrate formal safety guarantees w.r.t. the given cost model upon solving the MDP in the latent space with sufficiently large penalties.
arXiv Detail & Related papers (2025-02-03T09:59:32Z)
ConstrainedZero: Chance-Constrained POMDP Planning using Learned Probabilistic Failure Surrogates and Adaptive Safety Constraints [34.9739641898452]
This work introduces the ConstrainedZero policy algorithm that solves CC-POMDPs in belief space by learning neural network approximations of the optimal value and policy. Results show that by separating safety constraints from the objective we can achieve a target level of safety without optimizing the balance between rewards and costs.
arXiv Detail & Related papers (2024-05-01T17:17:22Z)
Learning Logic Specifications for Policy Guidance in POMDPs: an Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver. We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications. We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z)
Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states. In safety-critical domains, such behaviors could lead to disastrous outcomes. We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z)
Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning [7.103977648997475]
Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize reward but do not have safety guarantees during the learning and deployment phases. Model-based Dynamic Shielding (MBDS) to support MARL algorithm design.
arXiv Detail & Related papers (2023-04-13T06:08:10Z)
Online Shielding for Reinforcement Learning [59.86192283565134]
We propose an approach for online safety shielding of RL agents. During runtime, the shield analyses the safety of each available action. Based on this probability and a given threshold, the shield decides whether to block an action from the agent.
arXiv Detail & Related papers (2022-12-04T16:00:29Z)
Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers. We then present the pointwise feasibility conditions of the resulting safety controller. We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z)
Risk-Averse Decision Making Under Uncertainty [18.467950783426947]
A large class of decision making under uncertainty problems can be described via Markov decision processes (MDPs) or partially observable MDPs (POMDPs) In this paper, we consider the problem of designing policies for MDPs and POMDPs with objectives and constraints in terms of dynamic coherent risk measures.
arXiv Detail & Related papers (2021-09-09T07:52:35Z)
Rule-based Shielding for Partially Observable Monte-Carlo Planning [78.05638156687343]
We propose two contributions to Partially Observable Monte-Carlo Planning (POMCP) The first is a method for identifying unexpected actions selected by POMCP with respect to expert prior knowledge of the task. The second is a shielding approach that prevents POMCP from selecting unexpected actions. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to velocity regulation in mobile robot navigation.
arXiv Detail & Related papers (2021-04-28T14:23:38Z)
Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP) The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP. The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z)
Safe reinforcement learning for probabilistic reachability and safety specifications: A Lyapunov-based approach [2.741266294612776]
We propose a model-free safety specification method that learns the maximal probability of safe operation. Our approach constructs a Lyapunov function with respect to a safe policy to restrain each policy improvement stage. It yields a sequence of safe policies that determine the range of safe operation, called the safe set.
arXiv Detail & Related papers (2020-02-24T09:20:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.