Related papers: Learning to Walk Autonomously via Reset-Free Quality-Diversity

Learning to Walk Autonomously via Reset-Free Quality-Diversity

URL: http://arxiv.org/abs/2204.03655v1
Date: Thu, 7 Apr 2022 14:07:51 GMT
Title: Learning to Walk Autonomously via Reset-Free Quality-Diversity
Authors: Bryan Lim, Alexander Reichenbach, Antoine Cully
Abstract summary: Quality-Diversity algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills. Existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions. This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments.
Score: 73.08073762433376
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Quality-Diversity (QD) algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills. However, the generation of behavioural repertoires has mainly been limited to simulation environments instead of real-world learning. This is because existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions. This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments. We build on Dynamics-Aware Quality-Diversity (DA-QD) and introduce a behaviour selection policy that leverages the diversity of the imagined repertoire and environmental information to intelligently select of behaviours that can act as automatic resets. We demonstrate this through a task of learning to walk within defined training zones with obstacles. Our experiments show that we can learn full repertoires of legged locomotion controllers autonomously without manual resets with high sample efficiency in spite of harsh safety constraints. Finally, using an ablation of different target objectives, we show that it is important for RF-QD to have diverse types solutions available for the behaviour selection policy over solutions optimised with a specific objective. Videos and code available at https://sites.google.com/view/rf-qd.

Related papers

SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning [85.21378553454672]
We develop a library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment. We find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation. These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent robustness recovery and correction behaviors.
arXiv Detail & Related papers (2024-01-29T10:01:10Z)
Enhancing Robotic Navigation: An Evaluation of Single and Multi-Objective Reinforcement Learning Strategies [0.9208007322096532]
This study presents a comparative analysis between single-objective and multi-objective reinforcement learning methods for training a robot to navigate effectively to an end goal. By modifying the reward function to return a vector of rewards, each pertaining to a distinct objective, the robot learns a policy that effectively balances the different goals.
arXiv Detail & Related papers (2023-12-13T08:00:26Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Quality-Diversity Optimisation on a Physical Robot Through Dynamics-Aware and Reset-Free Learning [4.260312058817663]
We build upon the Reset-Free QD (RF-QD) algorithm to learn controllers directly on a physical robot. This method uses a dynamics model, learned from interactions between the robot and the environment, to predict the robot's behaviour. RF-QD also includes a recovery policy that returns the robot to a safe zone when it has walked outside of it, allowing continuous learning.
arXiv Detail & Related papers (2023-04-24T13:24:00Z)
Domain Randomization for Robust, Affordable and Effective Closed-loop Control of Soft Robots [10.977130974626668]
Soft robots are gaining popularity thanks to their intrinsic safety to contacts and adaptability. We show how Domain Randomization (DR) can solve this problem by enhancing RL policies for soft robots. We introduce a novel algorithmic extension to previous adaptive domain randomization methods for the automatic inference of dynamics parameters for deformable objects.
arXiv Detail & Related papers (2023-03-07T18:50:00Z)
Few-shot Quality-Diversity Optimization [50.337225556491774]
Quality-Diversity (QD) optimization has been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning. We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation. Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments.
arXiv Detail & Related papers (2021-09-14T17:12:20Z)
Backprop-Free Reinforcement Learning with Active Neural Generative Coding [84.11376568625353]
We propose a computational framework for learning action-driven generative models without backpropagation of errors (backprop) in dynamic environments. We develop an intelligent agent that operates even with sparse rewards, drawing inspiration from the cognitive theory of planning as inference. The robust performance of our agent offers promising evidence that a backprop-free approach for neural inference and learning can drive goal-directed behavior.
arXiv Detail & Related papers (2021-07-10T19:02:27Z)
IQ-Learn: Inverse soft-Q Learning for Imitation [95.06031307730245]
imitation learning from a small amount of expert data can be challenging in high-dimensional environments with complex dynamics. Behavioral cloning is a simple method that is widely used due to its simplicity of implementation and stable convergence. We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function.
arXiv Detail & Related papers (2021-06-23T03:43:10Z)
Hyperparameter Auto-tuning in Self-Supervised Robotic Learning [12.193817049957733]
Insufficient learning (due to convergence to local optima) results in under-performing policies whilst redundant learning wastes time and resources. We propose an auto-tuning technique based on the Evidence Lower Bound (ELBO) for self-supervised reinforcement learning. Our method can auto-tune online and yields the best performance at a fraction of the time and computational resources.
arXiv Detail & Related papers (2020-10-16T08:58:24Z)
Model-Based Quality-Diversity Search for Efficient Robot Learning [28.049034339935933]
novelty based Quality-Diversity(QD) algorithm. Network is trained concurrently to the repertoire and is used to avoid executing unpromising actions in the novelty search process. Experiments show that enhancing a QD algorithm with such a forward model improves the sample-efficiency and performance of the evolutionary process and the skill adaptation.
arXiv Detail & Related papers (2020-08-11T09:02:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.