Q-learning-based Model-free Safety Filter
- URL: http://arxiv.org/abs/2411.19809v1
- Date: Fri, 29 Nov 2024 16:16:59 GMT
- Title: Q-learning-based Model-free Safety Filter
- Authors: Guo Ning Sue, Yogita Choudhary, Richard Desatnik, Carmel Majidi, John Dolan, Guanya Shi,
- Abstract summary: This paper proposes a simple, plugin-and-play, and effective model-free safety filter learning framework.
We introduce a novel reward formulation and use Q-learning to learn Q-value functions to safeguard arbitrary task specific nominal policies.
- Score: 6.391687991642366
- License:
- Abstract: Ensuring safety via safety filters in real-world robotics presents significant challenges, particularly when the system dynamics is complex or unavailable. To handle this issue, learning-based safety filters recently gained popularity, which can be classified as model-based and model-free methods. Existing model-based approaches requires various assumptions on system model (e.g., control-affine), which limits their application in complex systems, and existing model-free approaches need substantial modifications to standard RL algorithms and lack versatility. This paper proposes a simple, plugin-and-play, and effective model-free safety filter learning framework. We introduce a novel reward formulation and use Q-learning to learn Q-value functions to safeguard arbitrary task specific nominal policies via filtering out their potentially unsafe actions. The threshold used in the filtering process is supported by our theoretical analysis. Due to its model-free nature and simplicity, our framework can be seamlessly integrated with various RL algorithms. We validate the proposed approach through simulations on double integrator and Dubin's car systems and demonstrate its effectiveness in real-world experiments with a soft robotic limb.
Related papers
- Learning Controlled Stochastic Differential Equations [61.82896036131116]
This work proposes a novel method for estimating both drift and diffusion coefficients of continuous, multidimensional, nonlinear controlled differential equations with non-uniform diffusion.
We provide strong theoretical guarantees, including finite-sample bounds for (L2), (Linfty), and risk metrics, with learning rates adaptive to coefficients' regularity.
Our method is available as an open-source Python library.
arXiv Detail & Related papers (2024-11-04T11:09:58Z) - Hierarchical Framework for Interpretable and Probabilistic Model-Based
Safe Reinforcement Learning [1.3678669691302048]
This paper proposes a novel approach for the use of deep reinforcement learning in safety-critical systems.
It combines the advantages of probabilistic modeling and reinforcement learning with the added benefits of interpretability.
arXiv Detail & Related papers (2023-10-28T20:30:57Z) - In-Distribution Barrier Functions: Self-Supervised Policy Filters that
Avoid Out-of-Distribution States [84.24300005271185]
We propose a control filter that wraps any reference policy and effectively encourages the system to stay in-distribution with respect to offline-collected safe demonstrations.
Our method is effective for two different visuomotor control tasks in simulation environments, including both top-down and egocentric view settings.
arXiv Detail & Related papers (2023-01-27T22:28:19Z) - Efficient Preference-Based Reinforcement Learning Using Learned Dynamics
Models [13.077993395762185]
Preference-based reinforcement learning (PbRL) can enable robots to learn to perform tasks based on an individual's preferences.
We study the benefits and challenges of using a learned dynamics model when performing PbRL.
arXiv Detail & Related papers (2023-01-11T22:22:54Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of
Foundation Models [103.71308117592963]
We present an algorithm for training self-destructing models leveraging techniques from meta-learning and adversarial learning.
In a small-scale experiment, we show MLAC can largely prevent a BERT-style model from being re-purposed to perform gender identification.
arXiv Detail & Related papers (2022-11-27T21:43:45Z) - Bridging Model-based Safety and Model-free Reinforcement Learning
through System Identification of Low Dimensional Linear Models [16.511440197186918]
We propose a new method to combine model-based safety with model-free reinforcement learning.
We show that a low-dimensional dynamical model is sufficient to capture the dynamics of the closed-loop system.
We illustrate that the found linear model is able to provide guarantees by safety-critical optimal control framework.
arXiv Detail & Related papers (2022-05-11T22:03:18Z) - Safety-aware Policy Optimisation for Autonomous Racing [17.10371721305536]
We introduce Hamilton-Jacobi (HJ) reachability theory into the constrained Markov decision process (CMDP) framework.
We demonstrate that the HJ safety value can be learned directly on vision context.
We evaluate our method on several benchmark tasks, including Safety Gym and Learn-to-Race (L2R), a recently-released high-fidelity autonomous racing environment.
arXiv Detail & Related papers (2021-10-14T20:15:45Z) - Partitioned Active Learning for Heterogeneous Systems [5.331649110169476]
We propose the partitioned active learning strategy established upon partitioned GP (PGP) modeling.
Global searching scheme accelerates the exploration aspect of active learning.
Local searching exploits the active learning criterion induced by the local GP model.
arXiv Detail & Related papers (2021-05-14T02:05:31Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.