Efficiently Learning Small Policies for Locomotion and Manipulation
- URL: http://arxiv.org/abs/2210.00140v1
- Date: Fri, 30 Sep 2022 23:49:00 GMT
- Title: Efficiently Learning Small Policies for Locomotion and Manipulation
- Authors: Shashank Hegde and Gaurav S. Sukhatme
- Abstract summary: We leverage graph hyper networks to learn graph hyper policies trained with off-policy reinforcement learning.
We show that our method can be appended to any off-policy reinforcement learning algorithm.
We provide a method to select the best architecture, given a constraint on the number of parameters.
- Score: 12.340412143459869
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural control of memory-constrained, agile robots requires small, yet highly
performant models. We leverage graph hyper networks to learn graph hyper
policies trained with off-policy reinforcement learning resulting in networks
that are two orders of magnitude smaller than commonly used networks yet encode
policies comparable to those encoded by much larger networks trained on the
same task. We show that our method can be appended to any off-policy
reinforcement learning algorithm, without any change in hyperparameters, by
showing results across locomotion and manipulation tasks. Further, we obtain an
array of working policies, with differing numbers of parameters, allowing us to
pick an optimal network for the memory constraints of a system. Training
multiple policies with our method is as sample efficient as training a single
policy. Finally, we provide a method to select the best architecture, given a
constraint on the number of parameters. Project website:
https://sites.google.com/usc.edu/graphhyperpolicy
Related papers
- Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance [66.51390591688802]
Value-Guided Policy Steering (V-GPS) is compatible with a wide range of different generalist policies, without needing to fine-tune or even access the weights of the policy.
We show that the same value function can improve the performance of five different state-of-the-art policies with different architectures.
arXiv Detail & Related papers (2024-10-17T17:46:26Z) - Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning [61.294110816231886]
We introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP)
SDP selectively activates experts and skills, enabling efficient and task-specific learning without retraining the entire model.
Demos and codes can be found in https://forrest-110.io/sparse_diffusion_policy/.
arXiv Detail & Related papers (2024-07-01T17:59:56Z) - Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [94.2860766709971]
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a wireless network with statistically-identical agents.
Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies.
arXiv Detail & Related papers (2024-04-04T06:24:11Z) - HyperPPO: A scalable method for finding small policies for robotic
control [14.789594427174052]
HyperPPO is an on-policy reinforcement learning algorithm that estimates the weights of multiple neural networks simultaneously.
Our method estimates weights for networks that are much smaller than those in common-use networks yet encode highly performant policies.
We demonstrate that the neural policies estimated by HyperPPO are capable of decentralized control of a Crazyflie2.1 quadrotor.
arXiv Detail & Related papers (2023-09-28T17:58:26Z) - Low-rank lottery tickets: finding efficient low-rank neural networks via
matrix differential equations [2.3488056916440856]
We propose a novel algorithm to find efficient low-rankworks.
Theseworks are determined and adapted already during the training phase.
Our method automatically and dynamically adapts the ranks during training to achieve a desired approximation accuracy.
arXiv Detail & Related papers (2022-05-26T18:18:12Z) - DAAS: Differentiable Architecture and Augmentation Policy Search [107.53318939844422]
This work considers the possible coupling between neural architectures and data augmentation and proposes an effective algorithm jointly searching for them.
Our approach achieves 97.91% accuracy on CIFAR-10 and 76.6% Top-1 accuracy on ImageNet dataset, showing the outstanding performance of our search algorithm.
arXiv Detail & Related papers (2021-09-30T17:15:17Z) - Coordinated Reinforcement Learning for Optimizing Mobile Networks [6.924083445159127]
We show how to use coordination graphs and reinforcement learning in a complex application involving hundreds of cooperating agents.
We show empirically that coordinated reinforcement learning outperforms other methods.
arXiv Detail & Related papers (2021-09-30T14:46:18Z) - Large Scale Distributed Collaborative Unlabeled Motion Planning with
Graph Policy Gradients [122.85280150421175]
We present a learning method to solve the unlabelled motion problem with motion constraints and space constraints in 2D space for a large number of robots.
We employ a graph neural network (GNN) to parameterize policies for the robots.
arXiv Detail & Related papers (2021-02-11T21:57:43Z) - Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces.
We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space.
NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z) - Randomized Policy Learning for Continuous State and Action MDPs [8.109579454896128]
We present textttRANDPOL, a generalized policy iteration algorithm for MDPs with continuous state and action spaces.
We show the numerical performance on challenging environments and compare them with deep neural network based algorithms.
arXiv Detail & Related papers (2020-06-08T02:49:47Z) - Multi-Task Reinforcement Learning with Soft Modularization [25.724764855681137]
Multi-task learning is a very challenging problem in reinforcement learning.
We introduce an explicit modularization technique on policy representation to alleviate this optimization issue.
We show our method improves both sample efficiency and performance over strong baselines by a large margin.
arXiv Detail & Related papers (2020-03-30T17:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.