Related papers: Jump-Start Reinforcement Learning with Self-Evolving Priors for Extreme Monopedal Locomotion

Jump-Start Reinforcement Learning with Self-Evolving Priors for Extreme Monopedal Locomotion

URL: http://arxiv.org/abs/2507.01243v1
Date: Tue, 01 Jul 2025 23:31:36 GMT
Title: Jump-Start Reinforcement Learning with Self-Evolving Priors for Extreme Monopedal Locomotion
Authors: Ziang Zheng, Guojian Zhan, Shiqi Liu, Yao Lyu, Tao Zhang, Shengbo Eben Li,
Abstract summary: We propose JumpER (jump-start reinforcement learning via self-evolving priors), an RL training framework that structures policy learning into multiple stages of increasing complexity.<n>By dynamically generating self-evolving priors, JumpER progressively refines and enhances guidance, thereby stabilizing exploration and policy optimization.<n>The resulting policy effectively handles challenging scenarios that traditional methods struggle to conquer, including wide gaps up to 60 cm, irregularly spaced stairs, and stepping stones with distances varying from 15 cm to 35 cm.
Score: 11.692916662706361
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Reinforcement learning (RL) has shown great potential in enabling quadruped robots to perform agile locomotion. However, directly training policies to simultaneously handle dual extreme challenges, i.e., extreme underactuation and extreme terrains, as in monopedal hopping tasks, remains highly challenging due to unstable early-stage interactions and unreliable reward feedback. To address this, we propose JumpER (jump-start reinforcement learning via self-evolving priors), an RL training framework that structures policy learning into multiple stages of increasing complexity. By dynamically generating self-evolving priors through iterative bootstrapping of previously learned policies, JumpER progressively refines and enhances guidance, thereby stabilizing exploration and policy optimization without relying on external expert priors or handcrafted reward shaping. Specifically, when integrated with a structured three-stage curriculum that incrementally evolves action modality, observation space, and task objective, JumpER enables quadruped robots to achieve robust monopedal hopping on unpredictable terrains for the first time. Remarkably, the resulting policy effectively handles challenging scenarios that traditional methods struggle to conquer, including wide gaps up to 60 cm, irregularly spaced stairs, and stepping stones with distances varying from 15 cm to 35 cm. JumpER thus provides a principled and scalable approach for addressing locomotion tasks under the dual challenges of extreme underactuation and extreme terrains.

Related papers

Dynamic Obstacle Avoidance with Bounded Rationality Adversarial Reinforcement Learning [5.760394464143113]
We propose a novel way to endow navigation policies with robustness by a training process that models obstacles as adversarial agents.<n>We call this method versa policies via Quantal response Adrial Reinforcement Learning (Hi-QARL)
arXiv Detail & Related papers (2025-03-14T14:54:02Z)
Training Directional Locomotion for Quadrupedal Low-Cost Robotic Systems via Deep Reinforcement Learning [4.669957449088593]
We present Deep Reinforcement Learning training of directional locomotion for low-cost quadpedalru robots in the real world.<n>We exploit randomization of heading that the robot must follow to foster exploration of action-state transitions.<n>Changing the heading in episode resets to current yaw plus a random value drawn from a normal distribution yields policies able to follow complex trajectories.
arXiv Detail & Related papers (2025-03-14T03:53:01Z)
BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds [35.62230804783507]
Existing learning-based approaches often struggle on complex terrains due to sparse foothold rewards and inefficient learning processes.<n>We introduce BeamDojo, a reinforcement learning framework designed for enabling agile humanoid locomotion on sparse footholds.<n>We show that BeamDojo achieves efficient learning in simulation and enables agile locomotion with precise foot placement on sparse footholds in the real world.
arXiv Detail & Related papers (2025-02-14T18:42:42Z)
QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds [51.05639500325598]
We introduce QuadrupedGPT, designed to follow diverse commands with agility comparable to that of a pet.<n>Our agent shows proficiency in handling diverse tasks and intricate instructions, representing a significant step toward the development of versatile quadruped agents.
arXiv Detail & Related papers (2024-06-24T12:14:24Z)
Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control [106.32794844077534]
This paper presents a study on using deep reinforcement learning to create dynamic locomotion controllers for bipedal robots. We develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing. This work pushes the limits of agility for bipedal robots through extensive real-world experiments.
arXiv Detail & Related papers (2024-01-30T10:48:43Z)
Grow Your Limits: Continuous Improvement with Real-World RL for Robotic Locomotion [66.69666636971922]
We present APRL, a policy regularization framework that modulates the robot's exploration over the course of training. APRL enables a quadrupedal robot to efficiently learn to walk entirely in the real world within minutes.
arXiv Detail & Related papers (2023-10-26T17:51:46Z)
Learning and Adapting Agile Locomotion Skills by Transferring Experience [71.8926510772552]
We propose a framework for training complex robotic skills by transferring experience from existing controllers to jumpstart learning new tasks. We show that our method enables learning complex agile jumping behaviors, navigating to goal locations while walking on hind legs, and adapting to new environments.
arXiv Detail & Related papers (2023-04-19T17:37:54Z)
Robust and Versatile Bipedal Jumping Control through Reinforcement Learning [141.56016556936865]
This work aims to push the limits of agility for bipedal robots by enabling a torque-controlled bipedal robot to perform robust and versatile dynamic jumps in the real world. We present a reinforcement learning framework for training a robot to accomplish a large variety of jumping tasks, such as jumping to different locations and directions. We develop a new policy structure that encodes the robot's long-term input/output (I/O) history while also providing direct access to a short-term I/O history.
arXiv Detail & Related papers (2023-02-19T01:06:09Z)
Reinforcement Learning with Evolutionary Trajectory Generator: A General Approach for Quadrupedal Locomotion [29.853927354893656]
We propose a novel RL-based approach that contains an evolutionary foot trajectory generator. The generator continually optimize the shape of the output trajectory for the given task, providing diversified motion priors to guide the policy learning. We deploy the controller learned in the simulation on a 12-DoF quadrupedal robot, and it can successfully traverse challenging scenarios with efficient gaits.
arXiv Detail & Related papers (2021-09-14T02:51:50Z)
Learning Agile Locomotion via Adversarial Training [59.03007947334165]
In this paper, we present a multi-agent learning system, in which a quadruped robot (protagonist) learns to chase another robot (adversary) while the latter learns to escape. We find that this adversarial training process not only encourages agile behaviors but also effectively alleviates the laborious environment design effort. In contrast to prior works that used only one adversary, we find that training an ensemble of adversaries, each of which specializes in a different escaping strategy, is essential for the protagonist to master agility.
arXiv Detail & Related papers (2020-08-03T01:20:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.