Related papers: Quality-Diversity Optimisation on a Physical Robot Through Dynamics-Aware and Reset-Free Learning

Quality-Diversity Optimisation on a Physical Robot Through Dynamics-Aware and Reset-Free Learning

URL: http://arxiv.org/abs/2304.12080v1
Date: Mon, 24 Apr 2023 13:24:00 GMT
Title: Quality-Diversity Optimisation on a Physical Robot Through Dynamics-Aware and Reset-Free Learning
Authors: Sim\'on C. Smith, Bryan Lim, Hannah Janmohamed, Antoine Cully
Abstract summary: We build upon the Reset-Free QD (RF-QD) algorithm to learn controllers directly on a physical robot. This method uses a dynamics model, learned from interactions between the robot and the environment, to predict the robot's behaviour. RF-QD also includes a recovery policy that returns the robot to a safe zone when it has walked outside of it, allowing continuous learning.
Score: 4.260312058817663
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning algorithms, like Quality-Diversity (QD), can be used to acquire repertoires of diverse robotics skills. This learning is commonly done via computer simulation due to the large number of evaluations required. However, training in a virtual environment generates a gap between simulation and reality. Here, we build upon the Reset-Free QD (RF-QD) algorithm to learn controllers directly on a physical robot. This method uses a dynamics model, learned from interactions between the robot and the environment, to predict the robot's behaviour and improve sample efficiency. A behaviour selection policy filters out uninteresting or unsafe policies predicted by the model. RF-QD also includes a recovery policy that returns the robot to a safe zone when it has walked outside of it, allowing continuous learning. We demonstrate that our method enables a physical quadruped robot to learn a repertoire of behaviours in two hours without human supervision. We successfully test the solution repertoire using a maze navigation task. Finally, we compare our approach to the MAP-Elites algorithm. We show that dynamics awareness and a recovery policy are required for training on a physical robot for optimal archive generation. Video available at https://youtu.be/BgGNvIsRh7Q

Related papers

Data-Efficient Learning from Human Interventions for Mobile Robots [46.65860995185883]
Mobile robots are essential in applications such as autonomous delivery and hospitality services. Applying learning-based methods to address mobile robot tasks has gained popularity due to its robustness and generalizability. Traditional methods such as Imitation Learning (IL) and Reinforcement Learning (RL) offer adaptability but require large datasets, carefully crafted reward functions, and face sim-to-real gaps. We propose an online human-in-the-loop learning method PVP4Real that combines IL and RL to address these issues.
arXiv Detail & Related papers (2025-03-06T21:02:02Z)
Simulation-Aided Policy Tuning for Black-Box Robot Learning [47.83474891747279]
We present a novel black-box policy search algorithm focused on data-efficient policy improvements. The algorithm learns directly on the robot and treats simulation as an additional information source to speed up the learning process. We show fast and successful task learning on a robot manipulator with the aid of an imperfect simulator.
arXiv Detail & Related papers (2024-11-21T15:52:23Z)
Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning. Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy. Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z)
Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics. Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens. We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z)
Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on. In this work, we propose MEDAL++, a novel design for self-improving robotic systems. The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z)
Learning to Fold Real Garments with One Arm: A Case Study in Cloud-Based Robotics Research [21.200764836237497]
We present the first systematic benchmarking of fabric manipulation algorithms on physical hardware. We develop 4 novel learning-based algorithms that model expert actions, keypoints, reward functions, and dynamic motions. The entire lifecycle of data collection, model training, and policy evaluation is performed remotely without physical access to the robot workcell.
arXiv Detail & Related papers (2022-04-21T17:31:20Z)
Learning to Walk Autonomously via Reset-Free Quality-Diversity [73.08073762433376]
Quality-Diversity algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills. Existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions. This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments.
arXiv Detail & Related papers (2022-04-07T14:07:51Z)
REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy Transfer [57.045140028275036]
We consider the problem of transferring a policy across two different robots with significantly different parameters such as kinematics and morphology. Existing approaches that train a new policy by matching the action or state transition distribution, including imitation learning methods, fail due to optimal action and/or state distribution being mismatched in different robots. We propose a novel method named $REvolveR$ of using continuous evolutionary models for robotic policy transfer implemented in a physics simulator.
arXiv Detail & Related papers (2022-02-10T18:50:25Z)
Learning Bipedal Robot Locomotion from Human Movement [0.791553652441325]
We present a reinforcement learning based method for teaching a real world bipedal robot to perform movements directly from motion capture data. Our method seamlessly transitions from training in a simulation environment to executing on a physical robot. We demonstrate our method on an internally developed humanoid robot with movements ranging from a dynamic walk cycle to complex balancing and waving.
arXiv Detail & Related papers (2021-05-26T00:49:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.