Provably Safe Deep Reinforcement Learning for Robotic Manipulation in
Human Environments
- URL: http://arxiv.org/abs/2205.06311v1
- Date: Thu, 12 May 2022 18:51:07 GMT
- Title: Provably Safe Deep Reinforcement Learning for Robotic Manipulation in
Human Environments
- Authors: Jakob Thumm and Matthias Althoff
- Abstract summary: We propose a shielding mechanism that ensures ISO-verified human safety while training and deploying RL algorithms on manipulators.
We utilize a fast reachability analysis of humans and manipulators to guarantee that the manipulator comes to a complete stop before a human is within its range.
- Score: 8.751383865142772
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep reinforcement learning (RL) has shown promising results in the motion
planning of manipulators. However, no method guarantees the safety of highly
dynamic obstacles, such as humans, in RL-based manipulator control. This lack
of formal safety assurances prevents the application of RL for manipulators in
real-world human environments. Therefore, we propose a shielding mechanism that
ensures ISO-verified human safety while training and deploying RL algorithms on
manipulators. We utilize a fast reachability analysis of humans and
manipulators to guarantee that the manipulator comes to a complete stop before
a human is within its range. Our proposed method guarantees safety and
significantly improves the RL performance by preventing episode-ending
collisions. We demonstrate the performance of our proposed method in simulation
using human motion capture data.
Related papers
- ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning [48.536695794883826]
We present ActSafe, a novel model-based RL algorithm for safe and efficient exploration.
We show that ActSafe guarantees safety during learning while also obtaining a near-optimal policy in finite time.
In addition, we propose a practical variant of ActSafe that builds on latest model-based RL advancements.
arXiv Detail & Related papers (2024-10-12T10:46:02Z) - Stable and Safe Human-aligned Reinforcement Learning through Neural Ordinary Differential Equations [1.5413714916429737]
This paper provides safety and stability definitions for such human-aligned tasks.
An algorithm that leverages neural ordinary differential equations (NODEs) to predict human and robot movements is proposed.
Simulation results show that the algorithm helps the controlled robot to reach the desired goal state with fewer safety violations.
arXiv Detail & Related papers (2024-01-23T23:50:19Z) - HAIM-DRL: Enhanced Human-in-the-loop Reinforcement Learning for Safe and Efficient Autonomous Driving [2.807187711407621]
We propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework.
We first introduce an innovative learning paradigm that effectively injects human intelligence into AI, termed Human as AI mentor (HAIM)
In this paradigm, the human expert serves as a mentor to the AI agent, while the agent could be guided to minimize traffic flow disturbance.
arXiv Detail & Related papers (2024-01-06T08:30:14Z) - Reinforcement Learning for Safe Robot Control using Control Lyapunov
Barrier Functions [9.690491406456307]
Reinforcement learning (RL) exhibits impressive performance when managing complicated control tasks for robots.
This paper explores the control Lyapunov barrier function (CLBF) to analyze the safety and reachability solely based on data.
We also proposed the Lyapunov barrier actor-critic (LBAC) to search for a controller that satisfies the data-based approximation of the safety and reachability conditions.
arXiv Detail & Related papers (2023-05-16T20:27:02Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Safe Reinforcement Learning using Data-Driven Predictive Control [0.5459797813771499]
We propose a data-driven safety layer that acts as a filter for unsafe actions.
The safety layer penalizes the RL agent if the proposed action is unsafe and replaces it with the closest safe one.
In a simulation, we show that our method outperforms state-of-the-art safe RL methods on the robotics navigation problem.
arXiv Detail & Related papers (2022-11-20T17:10:40Z) - Constrained Reinforcement Learning for Robotics via Scenario-Based
Programming [64.07167316957533]
It is crucial to optimize the performance of DRL-based agents while providing guarantees about their behavior.
This paper presents a novel technique for incorporating domain-expert knowledge into a constrained DRL training loop.
Our experiments demonstrate that using our approach to leverage expert knowledge dramatically improves the safety and the performance of the agent.
arXiv Detail & Related papers (2022-06-20T07:19:38Z) - Learning to be Safe: Deep RL with a Safety Critic [72.00568333130391]
A natural first approach toward safe RL is to manually specify constraints on the policy's behavior.
We propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors.
arXiv Detail & Related papers (2020-10-27T20:53:20Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.