Protective Policy Transfer
- URL: http://arxiv.org/abs/2012.06662v1
- Date: Fri, 11 Dec 2020 22:10:54 GMT
- Title: Protective Policy Transfer
- Authors: Wenhao Yu, C. Karen Liu, Greg Turk
- Abstract summary: We introduce a policy transfer algorithm for adapting robot motor skills to novel scenarios.
Our algorithm trains two control policies: a task policy that is optimized to complete the task of interest, and a protective policy that is dedicated to keep the robot from unsafe events.
We evaluate our approach on four simulated robot locomotion problems and a 2D navigation problem.
- Score: 37.897395735552706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Being able to transfer existing skills to new situations is a key capability
when training robots to operate in unpredictable real-world environments. A
successful transfer algorithm should not only minimize the number of samples
that the robot needs to collect in the new environment, but also prevent the
robot from damaging itself or the surrounding environment during the transfer
process. In this work, we introduce a policy transfer algorithm for adapting
robot motor skills to novel scenarios while minimizing serious failures. Our
algorithm trains two control policies in the training environment: a task
policy that is optimized to complete the task of interest, and a protective
policy that is dedicated to keep the robot from unsafe events (e.g. falling to
the ground). To decide which policy to use during execution, we learn a safety
estimator model in the training environment that estimates a continuous safety
level of the robot. When used with a set of thresholds, the safety estimator
becomes a classifier for switching between the protective policy and the task
policy. We evaluate our approach on four simulated robot locomotion problems
and a 2D navigation problem and show that our method can achieve successful
transfer to notably different environments while taking the robot's safety into
consideration.
Related papers
- Safe Policy Exploration Improvement via Subgoals [44.07721205323709]
Reinforcement learning is a widely used approach to autonomous navigation, showing potential in various tasks and robotic setups.
One of the main reasons for poor performance in such setups is that the need to respect the safety constraints degrades the exploration capabilities of an RL agent.
We introduce a novel learnable algorithm that is based on decomposing the initial problem into smaller sub-problems via intermediate goals.
arXiv Detail & Related papers (2024-08-25T16:12:49Z) - RACER: Epistemic Risk-Sensitive RL Enables Fast Driving with Fewer Crashes [57.319845580050924]
We propose a reinforcement learning framework that combines risk-sensitive control with an adaptive action space curriculum.
We show that our algorithm is capable of learning high-speed policies for a real-world off-road driving task.
arXiv Detail & Related papers (2024-05-07T23:32:36Z) - Task and Domain Adaptive Reinforcement Learning for Robot Control [0.34137115855910755]
We present a novel adaptive agent to dynamically adapt policy in response to different tasks and environmental conditions.
The agent is trained using a custom, highly parallelized simulator built on IsaacGym.
We perform zero-shot transfer to fly the blimp in the real world to solve various tasks.
arXiv Detail & Related papers (2024-04-29T14:02:02Z) - Deception Game: Closing the Safety-Learning Loop in Interactive Robot
Autonomy [7.915956857741506]
Existing safety methods often neglect the robot's ability to learn and adapt at runtime, leading to overly conservative behavior.
This paper proposes a new closed-loop paradigm for synthesizing safe control policies that explicitly account for the robot's evolving uncertainty.
arXiv Detail & Related papers (2023-09-03T20:34:01Z) - Learning Vision-based Pursuit-Evasion Robot Policies [54.52536214251999]
We develop a fully-observable robot policy that generates supervision for a partially-observable one.
We deploy our policy on a physical quadruped robot with an RGB-D camera on pursuit-evasion interactions in the wild.
arXiv Detail & Related papers (2023-08-30T17:59:05Z) - Safe reinforcement learning of dynamic high-dimensional robotic tasks:
navigation, manipulation, interaction [31.553783147007177]
In reinforcement learning, safety is even more fundamental for exploring an environment without causing any damage.
This paper introduces a new formulation of safe exploration for reinforcement learning of various robotic tasks.
Our approach applies to a wide class of robotic platforms and enforces safety even under complex collision constraints learned from data.
arXiv Detail & Related papers (2022-09-27T11:23:49Z) - REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy
Transfer [57.045140028275036]
We consider the problem of transferring a policy across two different robots with significantly different parameters such as kinematics and morphology.
Existing approaches that train a new policy by matching the action or state transition distribution, including imitation learning methods, fail due to optimal action and/or state distribution being mismatched in different robots.
We propose a novel method named $REvolveR$ of using continuous evolutionary models for robotic policy transfer implemented in a physics simulator.
arXiv Detail & Related papers (2022-02-10T18:50:25Z) - Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic
Platforms [60.59764170868101]
Reinforcement learning methods can achieve significant performance but require a large amount of training data collected on the same robotic platform.
We formulate it as a few-shot meta-learning problem where the goal is to find a model that captures the common structure shared across different robotic platforms.
We experimentally evaluate our framework on a simulated reaching and a real-robot picking task using 400 simulated robots.
arXiv Detail & Related papers (2021-03-05T14:16:20Z) - Towards Coordinated Robot Motions: End-to-End Learning of Motion
Policies on Transform Trees [63.31965375413414]
We propose to solve multi-task problems through learning structured policies from human demonstrations.
Our structured policy is inspired by RMPflow, a framework for combining subtask policies on different spaces.
We derive an end-to-end learning objective function that is suitable for the multi-task problem.
arXiv Detail & Related papers (2020-12-24T22:46:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.