Quality-Diversity Optimisation on a Physical Robot Through
Dynamics-Aware and Reset-Free Learning
- URL: http://arxiv.org/abs/2304.12080v1
- Date: Mon, 24 Apr 2023 13:24:00 GMT
- Title: Quality-Diversity Optimisation on a Physical Robot Through
Dynamics-Aware and Reset-Free Learning
- Authors: Sim\'on C. Smith, Bryan Lim, Hannah Janmohamed, Antoine Cully
- Abstract summary: We build upon the Reset-Free QD (RF-QD) algorithm to learn controllers directly on a physical robot.
This method uses a dynamics model, learned from interactions between the robot and the environment, to predict the robot's behaviour.
RF-QD also includes a recovery policy that returns the robot to a safe zone when it has walked outside of it, allowing continuous learning.
- Score: 4.260312058817663
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning algorithms, like Quality-Diversity (QD), can be used to acquire
repertoires of diverse robotics skills. This learning is commonly done via
computer simulation due to the large number of evaluations required. However,
training in a virtual environment generates a gap between simulation and
reality. Here, we build upon the Reset-Free QD (RF-QD) algorithm to learn
controllers directly on a physical robot. This method uses a dynamics model,
learned from interactions between the robot and the environment, to predict the
robot's behaviour and improve sample efficiency. A behaviour selection policy
filters out uninteresting or unsafe policies predicted by the model. RF-QD also
includes a recovery policy that returns the robot to a safe zone when it has
walked outside of it, allowing continuous learning. We demonstrate that our
method enables a physical quadruped robot to learn a repertoire of behaviours
in two hours without human supervision. We successfully test the solution
repertoire using a maze navigation task. Finally, we compare our approach to
the MAP-Elites algorithm. We show that dynamics awareness and a recovery policy
are required for training on a physical robot for optimal archive generation.
Video available at https://youtu.be/BgGNvIsRh7Q
Related papers
- Offline Imitation Learning Through Graph Search and Retrieval [57.57306578140857]
Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills.
We propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval.
GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
arXiv Detail & Related papers (2024-07-22T06:12:21Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Learning Preferences for Interactive Autonomy [1.90365714903665]
This thesis is an attempt towards learning reward functions from human users by using other, more reliable data modalities.
We first propose various forms of comparative feedback, e.g., pairwise comparisons, best-of-many choices, rankings, scaled comparisons; and describe how a robot can use these various forms of human feedback to infer a reward function.
arXiv Detail & Related papers (2022-10-19T21:34:51Z) - Learning to Fold Real Garments with One Arm: A Case Study in Cloud-Based
Robotics Research [21.200764836237497]
We present the first systematic benchmarking of fabric manipulation algorithms on physical hardware.
We develop 4 novel learning-based algorithms that model expert actions, keypoints, reward functions, and dynamic motions.
The entire lifecycle of data collection, model training, and policy evaluation is performed remotely without physical access to the robot workcell.
arXiv Detail & Related papers (2022-04-21T17:31:20Z) - Learning to Walk Autonomously via Reset-Free Quality-Diversity [73.08073762433376]
Quality-Diversity algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills.
Existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions.
This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments.
arXiv Detail & Related papers (2022-04-07T14:07:51Z) - REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy
Transfer [57.045140028275036]
We consider the problem of transferring a policy across two different robots with significantly different parameters such as kinematics and morphology.
Existing approaches that train a new policy by matching the action or state transition distribution, including imitation learning methods, fail due to optimal action and/or state distribution being mismatched in different robots.
We propose a novel method named $REvolveR$ of using continuous evolutionary models for robotic policy transfer implemented in a physics simulator.
arXiv Detail & Related papers (2022-02-10T18:50:25Z) - Learning Bipedal Robot Locomotion from Human Movement [0.791553652441325]
We present a reinforcement learning based method for teaching a real world bipedal robot to perform movements directly from motion capture data.
Our method seamlessly transitions from training in a simulation environment to executing on a physical robot.
We demonstrate our method on an internally developed humanoid robot with movements ranging from a dynamic walk cycle to complex balancing and waving.
arXiv Detail & Related papers (2021-05-26T00:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.