Towards Safe Continuing Task Reinforcement Learning
- URL: http://arxiv.org/abs/2102.12585v1
- Date: Wed, 24 Feb 2021 22:12:25 GMT
- Title: Towards Safe Continuing Task Reinforcement Learning
- Authors: Miguel Calvo-Fullana, Luiz F. O. Chamon, Santiago Paternain
- Abstract summary: We propose an algorithm capable of operating in the continuing task setting without the need of restarts.
We evaluate our approach in a numerical example, which shows the capabilities of the proposed approach in learning safe policies via safe exploration.
- Score: 21.390201009230246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Safety is a critical feature of controller design for physical systems. When
designing control policies, several approaches to guarantee this aspect of
autonomy have been proposed, such as robust controllers or control barrier
functions. However, these solutions strongly rely on the model of the system
being available to the designer. As a parallel development, reinforcement
learning provides model-agnostic control solutions but in general, it lacks the
theoretical guarantees required for safety. Recent advances show that under
mild conditions, control policies can be learned via reinforcement learning,
which can be guaranteed to be safe by imposing these requirements as
constraints of an optimization problem. However, to transfer from learning
safety to learning safely, there are two hurdles that need to be overcome: (i)
it has to be possible to learn the policy without having to re-initialize the
system; and (ii) the rollouts of the system need to be in themselves safe. In
this paper, we tackle the first issue, proposing an algorithm capable of
operating in the continuing task setting without the need of restarts. We
evaluate our approach in a numerical example, which shows the capabilities of
the proposed approach in learning safe policies via safe exploration.
Related papers
- Searching for Optimal Runtime Assurance via Reachability and
Reinforcement Learning [2.422636931175853]
runtime assurance system (RTA) for a given plant enables the exercise of an untrusted or experimental controller while assuring safety with a backup controller.
Existing RTA design strategies are well-known to be overly conservative and, in principle, can lead to safety violations.
In this paper, we formulate the optimal RTA design problem and present a new approach for solving it.
arXiv Detail & Related papers (2023-10-06T14:45:57Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z) - Guided Safe Shooting: model based reinforcement learning with safety constraints [3.8490154494129327]
We introduce Guided Safe Shooting (GuSS), a model-based RL approach that can learn to control systems with minimal violations of the safety constraints.
We propose three different safe planners, one based on a simple random shooting strategy and two based on MAP-Elites, a more advanced divergent-search algorithm.
arXiv Detail & Related papers (2022-06-20T12:46:35Z) - Model-Based Safe Reinforcement Learning with Time-Varying State and
Control Constraints: An Application to Intelligent Vehicles [13.40143623056186]
This paper proposes a safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints.
A multi-step policy evaluation mechanism is proposed to predict the policy's safety risk under time-varying safety constraints and guide the policy to update safely.
The proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment.
arXiv Detail & Related papers (2021-12-18T10:45:31Z) - Learning to be Safe: Deep RL with a Safety Critic [72.00568333130391]
A natural first approach toward safe RL is to manually specify constraints on the policy's behavior.
We propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors.
arXiv Detail & Related papers (2020-10-27T20:53:20Z) - Runtime-Safety-Guided Policy Repair [13.038017178545728]
We study the problem of policy repair for learning-based control policies in safety-critical settings.
We propose to reduce or even eliminate control switching by repairing' the trained policy based on runtime data produced by the safety controller.
arXiv Detail & Related papers (2020-08-17T23:31:48Z) - Neural Certificates for Safe Control Policies [108.4560749465701]
This paper develops an approach to learn a policy of a dynamical system that is guaranteed to be both provably safe and goal-reaching.
We show the effectiveness of the method to learn both safe and goal-reaching policies on various systems, including pendulums, cart-poles, and UAVs.
arXiv Detail & Related papers (2020-06-15T15:14:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.