Related papers: Safe Deep Policy Adaptation

Safe Deep Policy Adaptation

URL: http://arxiv.org/abs/2310.08602v3
Date: Sun, 28 Apr 2024 18:04:30 GMT
Title: Safe Deep Policy Adaptation
Authors: Wenli Xiao, Tairan He, John Dolan, Guanya Shi,
Abstract summary: Policy adaptation based on reinforcement learning (RL) offers versatility and generalizability but presents safety and robustness challenges. We propose SafeDPA, a novel RL and control framework that simultaneously tackles the problems of policy adaptation and safe reinforcement learning. We provide theoretical safety guarantees of SafeDPA and show the robustness of SafeDPA against learning errors and extra perturbations.
Score: 7.2747306035142225
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A critical goal of autonomy and artificial intelligence is enabling autonomous robots to rapidly adapt in dynamic and uncertain environments. Classic adaptive control and safe control provide stability and safety guarantees but are limited to specific system classes. In contrast, policy adaptation based on reinforcement learning (RL) offers versatility and generalizability but presents safety and robustness challenges. We propose SafeDPA, a novel RL and control framework that simultaneously tackles the problems of policy adaptation and safe reinforcement learning. SafeDPA jointly learns adaptive policy and dynamics models in simulation, predicts environment configurations, and fine-tunes dynamics models with few-shot real-world data. A safety filter based on the Control Barrier Function (CBF) on top of the RL policy is introduced to ensure safety during real-world deployment. We provide theoretical safety guarantees of SafeDPA and show the robustness of SafeDPA against learning errors and extra perturbations. Comprehensive experiments on (1) classic control problems (Inverted Pendulum), (2) simulation benchmarks (Safety Gym), and (3) a real-world agile robotics platform (RC Car) demonstrate great superiority of SafeDPA in both safety and task performance, over state-of-the-art baselines. Particularly, SafeDPA demonstrates notable generalizability, achieving a 300% increase in safety rate compared to the baselines, under unseen disturbances in real-world experiments.

Related papers

Safely Learning Controlled Stochastic Dynamics [61.82896036131116]
We introduce a method that ensures safe exploration and efficient estimation of system dynamics.<n>After training, the learned model enables predictions of the system's dynamics and permits safety verification of any given control.<n>We provide theoretical guarantees for safety and derive adaptive learning rates that improve with increasing Sobolev regularity of the true dynamics.
arXiv Detail & Related papers (2025-06-03T11:17:07Z)
One Filter to Deploy Them All: Robust Safety for Quadrupedal Navigation in Unknown Environments [10.103090762559708]
We propose an observation-conditioned reachability-based (OCR) safety-filter framework. Our key idea is to use an OCR value network (OCR-VN) that predicts the optimal control-theoretic safety value function for new failure regions. We demonstrate that the proposed framework can automatically safeguard a wide range of hierarchical quadruped controllers.
arXiv Detail & Related papers (2024-12-13T09:21:02Z)
ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning [48.536695794883826]
We present ActSafe, a novel model-based RL algorithm for safe and efficient exploration. We show that ActSafe guarantees safety during learning while also obtaining a near-optimal policy in finite time. In addition, we propose a practical variant of ActSafe that builds on latest model-based RL advancements.
arXiv Detail & Related papers (2024-10-12T10:46:02Z)
Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical Systems [15.863561935347692]
We develop provably safe and convergent reinforcement learning algorithms for control of nonlinear dynamical systems. Recent advances at the intersection of control and RL follow a two-stage, safety filter approach to enforcing hard safety constraints. We develop a single-stage, sampling-based approach to hard constraint satisfaction that learns RL controllers enjoying classical convergence guarantees.
arXiv Detail & Related papers (2024-03-06T19:39:20Z)
Modular Control Architecture for Safe Marine Navigation: Reinforcement Learning and Predictive Safety Filters [0.0]
Reinforcement learning is increasingly used to adapt to complex scenarios, but standard frameworks ensuring safety and stability are lacking. Predictive Safety Filters (PSF) offer a promising solution, ensuring constraint satisfaction in learning-based control without explicit constraint handling. We apply this approach to marine navigation, combining RL with PSF on a simulated Cybership II model. Results demonstrate the PSF's effectiveness in maintaining safety without hindering the RL agent's learning rate and performance, evaluated against a standard RL agent without PSF.
arXiv Detail & Related papers (2023-12-04T12:37:54Z)
Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent. Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control. The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z)
ISAACS: Iterative Soft Adversarial Actor-Critic for Safety [0.9217021281095907]
This work introduces a novel approach enabling scalable synthesis of robust safety-preserving controllers for robotic systems. A safety-seeking fallback policy is co-trained with an adversarial "disturbance" agent that aims to invoke the worst-case realization of model error. While the learned control policy does not intrinsically guarantee safety, it is used to construct a real-time safety filter.
arXiv Detail & Related papers (2022-12-06T18:53:34Z)
Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy. Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z)
Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers. We then present the pointwise feasibility conditions of the resulting safety controller. We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z)
Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques. We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z)
Model-Based Safe Reinforcement Learning with Time-Varying State and Control Constraints: An Application to Intelligent Vehicles [13.40143623056186]
This paper proposes a safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. A multi-step policy evaluation mechanism is proposed to predict the policy's safety risk under time-varying safety constraints and guide the policy to update safely. The proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment.
arXiv Detail & Related papers (2021-12-18T10:45:31Z)
Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings [129.80279257258098]
Reinforcement learning (RL) in real-world safety-critical target settings like urban driving is hazardous. We propose a "safety-critical adaptation" task setting: an agent first trains in non-safety-critical "source" environments. We propose a solution approach, CARL, that builds on the intuition that prior experience in diverse environments equips an agent to estimate risk.
arXiv Detail & Related papers (2020-08-15T01:40:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.