Back-stepping Experience Replay with Application to Model-free Reinforcement Learning for a Soft Snake Robot
- URL: http://arxiv.org/abs/2401.11372v2
- Date: Mon, 23 Sep 2024 22:58:39 GMT
- Title: Back-stepping Experience Replay with Application to Model-free Reinforcement Learning for a Soft Snake Robot
- Authors: Xinda Qi, Dong Chen, Zhaojian Li, Xiaobo Tan,
- Abstract summary: Back-stepping Experience Replay (BER) is compatible with arbitrary off-policy reinforcement learning algorithms.
We present an application of BER in a model-free RL approach for the locomotion and navigation of a soft snake robot.
- Score: 15.005962159112002
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel technique, Back-stepping Experience Replay (BER), that is compatible with arbitrary off-policy reinforcement learning (RL) algorithms. BER aims to enhance learning efficiency in systems with approximate reversibility, reducing the need for complex reward shaping. The method constructs reversed trajectories using back-stepping transitions to reach random or fixed targets. Interpretable as a bi-directional approach, BER addresses inaccuracies in back-stepping transitions through a distillation of the replay experience during learning. Given the intricate nature of soft robots and their complex interactions with environments, we present an application of BER in a model-free RL approach for the locomotion and navigation of a soft snake robot, which is capable of serpentine motion enabled by anisotropic friction between the body and ground. In addition, a dynamic simulator is developed to assess the effectiveness and efficiency of the BER algorithm, in which the robot demonstrates successful learning (reaching a 100% success rate) and adeptly reaches random targets, achieving an average speed 48% faster than that of the best baseline approach.
Related papers
- Research on Autonomous Robots Navigation based on Reinforcement Learning [13.559881645869632]
We use the Deep Q Network (DQN) and Proximal Policy Optimization (PPO) models to optimize the path planning and decision-making process.
We have verified the effectiveness and robustness of these models in various complex scenarios.
arXiv Detail & Related papers (2024-07-02T00:44:06Z) - Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms.
We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z) - A Novel Feature Learning-based Bio-inspired Neural Network for Real-time
Collision-free Rescue of Multi-Robot Systems [5.478000072204037]
A bio-inspired neural network is proposed to generate a rescue path in complex and dynamic environments.
The proposed FLBBINN aims to reduce the computational complexity of the neural network-based approach.
The results show that the proposed FLBBINN would significantly improve the speed, efficiency, and optimality for rescue operations.
arXiv Detail & Related papers (2024-03-13T04:43:10Z) - SERL: A Software Suite for Sample-Efficient Robotic Reinforcement
Learning [85.21378553454672]
We develop a library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment.
We find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation.
These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent robustness recovery and correction behaviors.
arXiv Detail & Related papers (2024-01-29T10:01:10Z) - REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous
Manipulation [61.7171775202833]
We introduce an efficient system for learning dexterous manipulation skills withReinforcement learning.
The main idea of our approach is the integration of recent advances in sample-efficient RL and replay buffer bootstrapping.
Our system completes the real-world training cycle by incorporating learned resets via an imitation-based pickup policy.
arXiv Detail & Related papers (2023-09-06T19:05:31Z) - Learning Bipedal Walking for Humanoids with Current Feedback [5.429166905724048]
We present an approach for overcoming the sim2real gap issue for humanoid robots arising from inaccurate torque-tracking at the actuator level.
Our approach successfully trains a unified, end-to-end policy in simulation that can be deployed on a real HRP-5P humanoid robot to achieve bipedal locomotion.
arXiv Detail & Related papers (2023-03-07T08:16:46Z) - Active Predicting Coding: Brain-Inspired Reinforcement Learning for
Sparse Reward Robotic Control Problems [79.07468367923619]
We propose a backpropagation-free approach to robotic control through the neuro-cognitive computational framework of neural generative coding (NGC)
We design an agent built completely from powerful predictive coding/processing circuits that facilitate dynamic, online learning from sparse rewards.
We show that our proposed ActPC agent performs well in the face of sparse (extrinsic) reward signals and is competitive with or outperforms several powerful backprop-based RL approaches.
arXiv Detail & Related papers (2022-09-19T16:49:32Z) - Real-to-Sim: Predicting Residual Errors of Robotic Systems with Sparse
Data using a Learning-based Unscented Kalman Filter [65.93205328894608]
We learn the residual errors between a dynamic and/or simulator model and the real robot.
We show that with the learned residual errors, we can further close the reality gap between dynamic models, simulations, and actual hardware.
arXiv Detail & Related papers (2022-09-07T15:15:12Z) - Model Predictive Actor-Critic: Accelerating Robot Skill Acquisition with
Deep Reinforcement Learning [42.525696463089794]
Model Predictive Actor-Critic (MoPAC) is a hybrid model-based/model-free method that combines model predictive rollouts with policy optimization as to mitigate model bias.
MoPAC guarantees optimal skill learning up to an approximation error and reduces necessary physical interaction with the environment.
arXiv Detail & Related papers (2021-03-25T13:50:24Z) - Online Body Schema Adaptation through Cost-Sensitive Active Learning [63.84207660737483]
The work was implemented in a simulation environment, using the 7DoF arm of the iCub robot simulator.
A cost-sensitive active learning approach is used to select optimal joint configurations.
The results show cost-sensitive active learning has similar accuracy to the standard active learning approach, while reducing in about half the executed movement.
arXiv Detail & Related papers (2021-01-26T16:01:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.