Smooth Exploration for Robotic Reinforcement Learning
- URL: http://arxiv.org/abs/2005.05719v2
- Date: Sun, 20 Jun 2021 09:49:35 GMT
- Title: Smooth Exploration for Robotic Reinforcement Learning
- Authors: Antonin Raffin, Jens Kober, Freek Stulp
- Abstract summary: Reinforcement learning (RL) enables robots to learn skills from interactions with the real world.
In practice, the unstructured step-based exploration used in Deep RL leads to jerky motion patterns on real robots.
We address these issues by adapting state-dependent exploration (SDE) to current Deep RL algorithms.
- Score: 11.215352918313577
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Reinforcement learning (RL) enables robots to learn skills from interactions
with the real world. In practice, the unstructured step-based exploration used
in Deep RL -- often very successful in simulation -- leads to jerky motion
patterns on real robots. Consequences of the resulting shaky behavior are poor
exploration, or even damage to the robot. We address these issues by adapting
state-dependent exploration (SDE) to current Deep RL algorithms. To enable this
adaptation, we propose two extensions to the original SDE, using more general
features and re-sampling the noise periodically, which leads to a new
exploration method generalized state-dependent exploration (gSDE). We evaluate
gSDE both in simulation, on PyBullet continuous control tasks, and directly on
three different real robots: a tendon-driven elastic robot, a quadruped and an
RC car. The noise sampling interval of gSDE permits to have a compromise
between performance and smoothness, which allows training directly on the real
robots without loss of performance. The code is available at
https://github.com/DLR-RM/stable-baselines3.
Related papers
- Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control [106.32794844077534]
This paper presents a study on using deep reinforcement learning to create dynamic locomotion controllers for bipedal robots.
We develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing.
This work pushes the limits of agility for bipedal robots through extensive real-world experiments.
arXiv Detail & Related papers (2024-01-30T10:48:43Z) - SERL: A Software Suite for Sample-Efficient Robotic Reinforcement
Learning [85.21378553454672]
We develop a library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment.
We find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation.
These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent robustness recovery and correction behaviors.
arXiv Detail & Related papers (2024-01-29T10:01:10Z) - Quality-Diversity Optimisation on a Physical Robot Through
Dynamics-Aware and Reset-Free Learning [4.260312058817663]
We build upon the Reset-Free QD (RF-QD) algorithm to learn controllers directly on a physical robot.
This method uses a dynamics model, learned from interactions between the robot and the environment, to predict the robot's behaviour.
RF-QD also includes a recovery policy that returns the robot to a safe zone when it has walked outside of it, allowing continuous learning.
arXiv Detail & Related papers (2023-04-24T13:24:00Z) - Domain Randomization for Robust, Affordable and Effective Closed-loop
Control of Soft Robots [10.977130974626668]
Soft robots are gaining popularity thanks to their intrinsic safety to contacts and adaptability.
We show how Domain Randomization (DR) can solve this problem by enhancing RL policies for soft robots.
We introduce a novel algorithmic extension to previous adaptive domain randomization methods for the automatic inference of dynamics parameters for deformable objects.
arXiv Detail & Related papers (2023-03-07T18:50:00Z) - Learning Bipedal Walking for Humanoids with Current Feedback [5.429166905724048]
We present an approach for overcoming the sim2real gap issue for humanoid robots arising from inaccurate torque-tracking at the actuator level.
Our approach successfully trains a unified, end-to-end policy in simulation that can be deployed on a real HRP-5P humanoid robot to achieve bipedal locomotion.
arXiv Detail & Related papers (2023-03-07T08:16:46Z) - Accelerating Robotic Reinforcement Learning via Parameterized Action
Primitives [92.0321404272942]
Reinforcement learning can be used to build general-purpose robotic systems.
However, training RL agents to solve robotics tasks still remains challenging.
In this work, we manually specify a library of robot action primitives (RAPS), parameterized with arguments that are learned by an RL policy.
We find that our simple change to the action interface substantially improves both the learning efficiency and task performance.
arXiv Detail & Related papers (2021-10-28T17:59:30Z) - OSCAR: Data-Driven Operational Space Control for Adaptive and Robust
Robot Manipulation [50.59541802645156]
Operational Space Control (OSC) has been used as an effective task-space controller for manipulation.
We propose OSC for Adaptation and Robustness (OSCAR), a data-driven variant of OSC that compensates for modeling errors.
We evaluate our method on a variety of simulated manipulation problems, and find substantial improvements over an array of controller baselines.
arXiv Detail & Related papers (2021-10-02T01:21:38Z) - SABER: Data-Driven Motion Planner for Autonomously Navigating
Heterogeneous Robots [112.2491765424719]
We present an end-to-end online motion planning framework that uses a data-driven approach to navigate a heterogeneous robot team towards a global goal.
We use model predictive control (SMPC) to calculate control inputs that satisfy robot dynamics, and consider uncertainty during obstacle avoidance with chance constraints.
recurrent neural networks are used to provide a quick estimate of future state uncertainty considered in the SMPC finite-time horizon solution.
A Deep Q-learning agent is employed to serve as a high-level path planner, providing the SMPC with target positions that move the robots towards a desired global goal.
arXiv Detail & Related papers (2021-08-03T02:56:21Z) - Deep Imitation Learning for Bimanual Robotic Manipulation [70.56142804957187]
We present a deep imitation learning framework for robotic bimanual manipulation.
A core challenge is to generalize the manipulation skills to objects in different locations.
We propose to (i) decompose the multi-modal dynamics into elemental movement primitives, (ii) parameterize each primitive using a recurrent graph neural network to capture interactions, and (iii) integrate a high-level planner that composes primitives sequentially and a low-level controller to combine primitive dynamics and inverse kinematics control.
arXiv Detail & Related papers (2020-10-11T01:40:03Z) - Robust Reinforcement Learning-based Autonomous Driving Agent for
Simulation and Real World [0.0]
We present a DRL-based algorithm that is capable of performing autonomous robot control using Deep Q-Networks (DQN)
In our approach, the agent is trained in a simulated environment and it is able to navigate both in a simulated and real-world environment.
The trained agent is able to run on limited hardware resources and its performance is comparable to state-of-the-art approaches.
arXiv Detail & Related papers (2020-09-23T15:23:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.