Data Generation Method for Learning a Low-dimensional Safe Region in
Safe Reinforcement Learning
- URL: http://arxiv.org/abs/2109.05077v1
- Date: Fri, 10 Sep 2021 19:22:43 GMT
- Title: Data Generation Method for Learning a Low-dimensional Safe Region in
Safe Reinforcement Learning
- Authors: Zhehua Zhou, Ozgur S. Oguz, Yi Ren, Marion Leibold and Martin Buss
- Abstract summary: Safe reinforcement learning aims to learn a control policy while ensuring that neither the system nor the environment gets damaged during the learning process.
For implementing safe reinforcement learning on highly nonlinear and high-dimensional dynamical systems, one possible approach is to find a low-dimensional safe region via data-driven feature extraction methods.
As the reliability of the learned safety estimates is data-dependent, we investigate in this work how different training data will affect the safe reinforcement learning approach.
- Score: 9.903083270841638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Safe reinforcement learning aims to learn a control policy while ensuring
that neither the system nor the environment gets damaged during the learning
process. For implementing safe reinforcement learning on highly nonlinear and
high-dimensional dynamical systems, one possible approach is to find a
low-dimensional safe region via data-driven feature extraction methods, which
provides safety estimates to the learning algorithm. As the reliability of the
learned safety estimates is data-dependent, we investigate in this work how
different training data will affect the safe reinforcement learning approach.
By balancing between the learning performance and the risk of being unsafe, a
data generation method that combines two sampling methods is proposed to
generate representative training data. The performance of the method is
demonstrated with a three-link inverted pendulum example.
Related papers
- Global Safe Sequential Learning via Efficient Knowledge Transfer [21.817220232038157]
In many medical or engineering applications, the data selection is constrained by a priori unknown safety conditions.
We propose transfer safe sequential learning to accelerate the learning of safety.
arXiv Detail & Related papers (2024-02-22T09:43:25Z) - State-Wise Safe Reinforcement Learning With Pixel Observations [12.338614299403305]
We propose a novel pixel-observation safe RL algorithm that efficiently encodes state-wise safety constraints with unknown hazard regions.
As a joint learning framework, our approach begins by constructing a latent dynamics model with low-dimensional latent spaces derived from pixel observations.
We then build and learn a latent barrier-like function on top of the latent dynamics and conduct policy optimization simultaneously, thereby improving both safety and the total expected return.
arXiv Detail & Related papers (2023-11-03T20:32:30Z) - In-Distribution Barrier Functions: Self-Supervised Policy Filters that
Avoid Out-of-Distribution States [84.24300005271185]
We propose a control filter that wraps any reference policy and effectively encourages the system to stay in-distribution with respect to offline-collected safe demonstrations.
Our method is effective for two different visuomotor control tasks in simulation environments, including both top-down and egocentric view settings.
arXiv Detail & Related papers (2023-01-27T22:28:19Z) - Safe Exploration Method for Reinforcement Learning under Existence of
Disturbance [1.1470070927586016]
We deal with a safe exploration problem in reinforcement learning under the existence of disturbance.
We propose a safe exploration method that uses partial prior knowledge of a controlled object and disturbance.
We illustrate the validity and effectiveness of the proposed method through numerical simulations of an inverted pendulum and a four-bar parallel link robot manipulator.
arXiv Detail & Related papers (2022-09-30T13:00:33Z) - Sample-efficient Safe Learning for Online Nonlinear Control with Control
Barrier Functions [35.9713619595494]
Reinforcement Learning and continuous nonlinear control have been successfully deployed in multiple domains of complicated sequential decision-making tasks.
Given the exploration nature of the learning process and the presence of model uncertainty, it is challenging to apply them to safety-critical control tasks.
We propose a emphprovably efficient episodic safe learning framework for online control tasks.
arXiv Detail & Related papers (2022-07-29T00:54:35Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - SAFER: Data-Efficient and Safe Reinforcement Learning via Skill
Acquisition [59.94644674087599]
We propose SAFEty skill pRiors (SAFER), an algorithm that accelerates policy learning on complex control tasks under safety constraints.
Through principled training on an offline dataset, SAFER learns to extract safe primitive skills.
In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies.
arXiv Detail & Related papers (2022-02-10T05:43:41Z) - Learning to be Safe: Deep RL with a Safety Critic [72.00568333130391]
A natural first approach toward safe RL is to manually specify constraints on the policy's behavior.
We propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors.
arXiv Detail & Related papers (2020-10-27T20:53:20Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z) - Chance-Constrained Trajectory Optimization for Safe Exploration and
Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training.
We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.