Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control
- URL: http://arxiv.org/abs/2402.06570v2
- Date: Mon, 3 Jun 2024 20:02:33 GMT
- Title: Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control
- Authors: Zheng Xiong, Risto Vuorio, Jacob Beck, Matthieu Zimmer, Kun Shao, Shimon Whiteson,
- Abstract summary: Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies.
To achieve both good performance like TF and high efficiency at inference time, we propose HyperDistill.
We show that on UNIMAL, a benchmark with hundreds of diverse morphologies, HyperDistill performs as well as a universal TF teacher policy on both training and unseen test robots, but reduces model size by 6-14 times, and computational cost by 67-160 times in different environments.
- Score: 34.40439673925125
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies. However, learning a highly performant universal policy requires sophisticated architectures like transformers (TF) that have larger memory and computational cost than simpler multi-layer perceptrons (MLP). To achieve both good performance like TF and high efficiency like MLP at inference time, we propose HyperDistill, which consists of: (1) A morphology-conditioned hypernetwork (HN) that generates robot-wise MLP policies, and (2) A policy distillation approach that is essential for successful training. We show that on UNIMAL, a benchmark with hundreds of diverse morphologies, HyperDistill performs as well as a universal TF teacher policy on both training and unseen test robots, but reduces model size by 6-14 times, and computational cost by 67-160 times in different environments. Our analysis attributes the efficiency advantage of HyperDistill at inference time to knowledge decoupling, i.e., the ability to decouple inter-task and intra-task knowledge, a general principle that could also be applied to improve inference efficiency in other domains.
Related papers
- Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning [61.294110816231886]
We introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP)
SDP selectively activates experts and skills, enabling efficient and task-specific learning without retraining the entire model.
Demos and codes can be found in https://forrest-110.io/sparse_diffusion_policy/.
arXiv Detail & Related papers (2024-07-01T17:59:56Z) - Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms.
We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z) - ManyQuadrupeds: Learning a Single Locomotion Policy for Diverse
Quadruped Robots [4.557963624437784]
We show that drawing inspiration from animal motor control allows us to effectively train a single locomotion policy for quadruped robots.
Our policy modulates a representation of the Central Pattern Generator (CPG) in the spinal cord.
We observe robust performance, even when adding a 15 kg load, equivalent to 125% of the A1 robot's nominal mass.
arXiv Detail & Related papers (2023-10-16T15:06:16Z) - Universal Morphology Control via Contextual Modulation [52.742056836818136]
Learning a universal policy across different robot morphologies can significantly improve learning efficiency and generalization in continuous control.
Existing methods utilize graph neural networks or transformers to handle heterogeneous state and action spaces across different morphologies.
We propose a hierarchical architecture to better model this dependency via contextual modulation.
arXiv Detail & Related papers (2023-02-22T00:04:12Z) - Online Weighted Q-Ensembles for Reduced Hyperparameter Tuning in
Reinforcement Learning [0.38073142980732994]
Reinforcement learning is a promising paradigm for learning robot control, allowing complex control policies to be learned without requiring a dynamics model.
We propose employing an ensemble of multiple reinforcement learning agents, each with a different set of hyper parameters, along with a mechanism for choosing the best performing set.
Online weighted Q-Ensemble presented overall lower variance and superior results when compared with q-average ensembles.
arXiv Detail & Related papers (2022-09-29T19:57:43Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Efficient Feature Transformations for Discriminative and Generative
Continual Learning [98.10425163678082]
We propose a simple task-specific feature map transformation strategy for continual learning.
Theses provide powerful flexibility for learning new tasks, achieved with minimal parameters added to the base architecture.
We demonstrate the efficacy and efficiency of our method with an extensive set of experiments in discriminative (CIFAR-100 and ImageNet-1K) and generative sequences of tasks.
arXiv Detail & Related papers (2021-03-25T01:48:14Z) - MAMBPO: Sample-efficient multi-robot reinforcement learning using
learned world models [4.84279798426797]
Multi-robot systems can benefit from reinforcement learning (RL) algorithms that learn behaviours in a small number of trials.
We present a novel multi-agent model-based RL algorithm: Multi-Agent Model-Based Policy Optimization (MAMBPO)
arXiv Detail & Related papers (2021-03-05T13:37:23Z) - Learning Whole-body Motor Skills for Humanoids [25.443880385966114]
This paper presents a hierarchical framework for Deep Reinforcement Learning that acquires motor skills for a variety of push recovery and balancing behaviors.
The policy is trained in a physics simulator with realistic setting of robot model and low-level impedance control that are easy to transfer the learned skills to real robots.
arXiv Detail & Related papers (2020-02-07T19:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.