Tiny Reinforcement Learning for Quadruped Locomotion using Decision
Transformers
- URL: http://arxiv.org/abs/2402.13201v1
- Date: Tue, 20 Feb 2024 18:10:39 GMT
- Title: Tiny Reinforcement Learning for Quadruped Locomotion using Decision
Transformers
- Authors: Orhan Eren Akg\"un, N\'estor Cuevas, Matheus Farias, Daniel Garces
- Abstract summary: Resource-constrained robotic platforms are useful for tasks that require low-cost hardware alternatives.
We propose a method for making imitation learning deployable onto resource-constrained robotic platforms.
We show that our method achieves natural looking gaits for Bittle, a resource-constrained quadruped robot.
- Score: 0.9217021281095907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Resource-constrained robotic platforms are particularly useful for tasks that
require low-cost hardware alternatives due to the risk of losing the robot,
like in search-and-rescue applications, or the need for a large number of
devices, like in swarm robotics. For this reason, it is crucial to find
mechanisms for adapting reinforcement learning techniques to the constraints
imposed by lower computational power and smaller memory capacities of these
ultra low-cost robotic platforms. We try to address this need by proposing a
method for making imitation learning deployable onto resource-constrained
robotic platforms. Here we cast the imitation learning problem as a conditional
sequence modeling task and we train a decision transformer using expert
demonstrations augmented with a custom reward. Then, we compress the resulting
generative model using software optimization schemes, including quantization
and pruning. We test our method in simulation using Isaac Gym, a realistic
physics simulation environment designed for reinforcement learning. We
empirically demonstrate that our method achieves natural looking gaits for
Bittle, a resource-constrained quadruped robot. We also run multiple
simulations to show the effects of pruning and quantization on the performance
of the model. Our results show that quantization (down to 4 bits) and pruning
reduce model size by around 30\% while maintaining a competitive reward, making
the model deployable in a resource-constrained system.
Related papers
- DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language Model [72.66465487508556]
DiffGen is a novel framework that integrates differentiable physics simulation, differentiable rendering, and a vision-language model.
It can generate realistic robot demonstrations by minimizing the distance between the embedding of the language instruction and the embedding of the simulated observation.
Experiments demonstrate that with DiffGen, we could efficiently and effectively generate robot data with minimal human effort or training time.
arXiv Detail & Related papers (2024-05-12T15:38:17Z) - Learning Quadruped Locomotion Using Differentiable Simulation [31.80380408663424]
Differentiable simulation promises fast convergence and stable training.
This work proposes a new differentiable simulation framework to overcome these challenges.
Our framework enables learning quadruped walking in simulation in minutes without parallelization.
arXiv Detail & Related papers (2024-03-21T22:18:59Z) - RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation [68.70755196744533]
RoboGen is a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation.
Our work attempts to extract the extensive and versatile knowledge embedded in large-scale models and transfer them to the field of robotics.
arXiv Detail & Related papers (2023-11-02T17:59:21Z) - Not Only Rewards But Also Constraints: Applications on Legged Robot Locomotion [2.7052274816160966]
We propose a novel reinforcement learning framework for training neural network controllers for complex robotic systems consisting of both rewards and constraints.
The learning framework is applied to train controllers for several legged robots with different morphology and physical attributes to traverse challenging terrains.
arXiv Detail & Related papers (2023-08-24T03:06:20Z) - Learning Bipedal Walking for Humanoids with Current Feedback [5.429166905724048]
We present an approach for overcoming the sim2real gap issue for humanoid robots arising from inaccurate torque-tracking at the actuator level.
Our approach successfully trains a unified, end-to-end policy in simulation that can be deployed on a real HRP-5P humanoid robot to achieve bipedal locomotion.
arXiv Detail & Related papers (2023-03-07T08:16:46Z) - Obstacle Avoidance for Robotic Manipulator in Joint Space via Improved
Proximal Policy Optimization [6.067589886362815]
In this paper, we train a deep neural network via an improved Proximal Policy Optimization (PPO) algorithm to map from task space to joint space for a 6-DoF manipulator.
Since training such a task in real-robot is time-consuming and strenuous, we develop a simulation environment to train the model.
Experimental results showed that using our method, the robot was capable of tracking a single target or reaching multiple targets in unstructured environments.
arXiv Detail & Related papers (2022-10-03T10:21:57Z) - Real-to-Sim: Predicting Residual Errors of Robotic Systems with Sparse
Data using a Learning-based Unscented Kalman Filter [65.93205328894608]
We learn the residual errors between a dynamic and/or simulator model and the real robot.
We show that with the learned residual errors, we can further close the reality gap between dynamic models, simulations, and actual hardware.
arXiv Detail & Related papers (2022-09-07T15:15:12Z) - REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy
Transfer [57.045140028275036]
We consider the problem of transferring a policy across two different robots with significantly different parameters such as kinematics and morphology.
Existing approaches that train a new policy by matching the action or state transition distribution, including imitation learning methods, fail due to optimal action and/or state distribution being mismatched in different robots.
We propose a novel method named $REvolveR$ of using continuous evolutionary models for robotic policy transfer implemented in a physics simulator.
arXiv Detail & Related papers (2022-02-10T18:50:25Z) - Robot Learning from Randomized Simulations: A Review [59.992761565399185]
Deep learning has caused a paradigm shift in robotics research, favoring methods that require large amounts of data.
State-of-the-art approaches learn in simulation where data generation is fast as well as inexpensive.
We focus on a technique named 'domain randomization' which is a method for learning from randomized simulations.
arXiv Detail & Related papers (2021-11-01T13:55:41Z) - Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic
Platforms [60.59764170868101]
Reinforcement learning methods can achieve significant performance but require a large amount of training data collected on the same robotic platform.
We formulate it as a few-shot meta-learning problem where the goal is to find a model that captures the common structure shared across different robotic platforms.
We experimentally evaluate our framework on a simulated reaching and a real-robot picking task using 400 simulated robots.
arXiv Detail & Related papers (2021-03-05T14:16:20Z) - robo-gym -- An Open Source Toolkit for Distributed Deep Reinforcement
Learning on Real and Simulated Robots [0.5161531917413708]
We propose an open source toolkit: robo-gym to increase the use of Deep Reinforcement Learning with real robots.
We demonstrate a unified setup for simulation and real environments which enables a seamless transfer from training in simulation to application on the robot.
We showcase the capabilities and the effectiveness of the framework with two real world applications featuring industrial robots.
arXiv Detail & Related papers (2020-07-06T13:51:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.