Related papers: STRIDE: Automating Reward Design, Deep Reinforcement Learning Training and Feedback Optimization in Humanoid Robotics Locomotion

STRIDE: Automating Reward Design, Deep Reinforcement Learning Training and Feedback Optimization in Humanoid Robotics Locomotion

URL: http://arxiv.org/abs/2502.04692v3
Date: Wed, 12 Feb 2025 01:02:51 GMT
Title: STRIDE: Automating Reward Design, Deep Reinforcement Learning Training and Feedback Optimization in Humanoid Robotics Locomotion
Authors: Zhenwei Wu, Jinxiong Lu, Yuxiao Chen, Yunxin Liu, Yueting Zhuang, Luhui Hu,
Abstract summary: We introduce STRIDE, a novel framework built on agentic engineering to automate reward design, DRL training, and feedback optimization for humanoid robot locomotion tasks.<n>By combining structured principles of agentic engineering with large language models (LLMs) for code-writing, zero-shot generation, and in-context optimization, STRIDE generates, evaluates, and iteratively refines reward functions without relying on task-specific prompts or templates.<n>Across diverse environments featuring humanoid robot morphologies, STRIDE outperforms the state-of-the-art reward design framework EUREKA, achieving an average improvement of round 250% in
Score: 33.91518509518502
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Humanoid robotics presents significant challenges in artificial intelligence, requiring precise coordination and control of high-degree-of-freedom systems. Designing effective reward functions for deep reinforcement learning (DRL) in this domain remains a critical bottleneck, demanding extensive manual effort, domain expertise, and iterative refinement. To overcome these challenges, we introduce STRIDE, a novel framework built on agentic engineering to automate reward design, DRL training, and feedback optimization for humanoid robot locomotion tasks. By combining the structured principles of agentic engineering with large language models (LLMs) for code-writing, zero-shot generation, and in-context optimization, STRIDE generates, evaluates, and iteratively refines reward functions without relying on task-specific prompts or templates. Across diverse environments featuring humanoid robot morphologies, STRIDE outperforms the state-of-the-art reward design framework EUREKA, achieving an average improvement of round 250% in efficiency and task performance. Using STRIDE-generated rewards, simulated humanoid robots achieve sprint-level locomotion across complex terrains, highlighting its ability to advance DRL workflows and humanoid robotics research.

Related papers

Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models [5.2364456910271935]
Reinforcement Learning (RL) enables agents to autonomously optimize complex behaviors through interaction and reward signals. In this work, we propose an unsupervised pipeline leveraging GPT-4, a pre-trained LLM, to generate reward functions directly from natural language task descriptions. The rewards are used to train RL agents in simulated environments, where we formalize the reward generation process to enhance feasibility.
arXiv Detail & Related papers (2025-03-06T10:08:44Z)
Structural Optimization of Lightweight Bipedal Robot via SERL [6.761861053481078]
This paper introduces the SERL (Structure Evolution Reinforcement Learning) algorithm, which combines reinforcement learning for locomotion tasks with evolution algorithms. We successfully designed a bipedal robot named Wow Orin, where the optimal leg length are obtained through optimization based on body structure and motor torque.
arXiv Detail & Related papers (2024-08-28T08:34:05Z)
MuTT: A Multimodal Trajectory Transformer for Robot Skills [14.84252843639553]
MuTT is a novel encoder-decoder transformer architecture designed to predict environment-aware executions of robot skills. We pioneer the fusion of vision and trajectory, introducing a novel trajectory projection. This approach facilitates the optimization of robot skill parameters for the current environment, without the need for real-world executions.
arXiv Detail & Related papers (2024-07-22T14:18:52Z)
Affordance-Guided Reinforcement Learning via Visual Prompting [51.361977466993345]
Keypoint-based Affordance Guidance for Improvements (KAGI) is a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL. On real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 20K online fine-tuning steps.
arXiv Detail & Related papers (2024-07-14T21:41:29Z)
I-CTRL: Imitation to Control Humanoid Robots Through Constrained Reinforcement Learning [8.97654258232601]
We develop a framework to control humanoid robots through bounded residual reinforcement learning (I-CTRL) I-CTRL excels in motion imitation with simple and unique rewards that generalize across five robots. Our framework introduces an automatic priority scheduler to manage large-scale motion datasets.
arXiv Detail & Related papers (2024-05-14T16:12:27Z)
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning [82.46975428739329]
We develop a library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment. We find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation. These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent robustness recovery and correction behaviors.
arXiv Detail & Related papers (2024-01-29T10:01:10Z)
REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world. Recent methods aim to mitigate misalignment by learning reward functions from human preferences. We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z)
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation [68.70755196744533]
RoboGen is a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation. Our work attempts to extract the extensive and versatile knowledge embedded in large-scale models and transfer them to the field of robotics.
arXiv Detail & Related papers (2023-11-02T17:59:21Z)
GLSO: Grammar-guided Latent Space Optimization for Sample-efficient Robot Design Automation [16.96128900256427]
We present Grammar-guided Latent Space Optimization (GLSO), a framework that transforms design automation into a low-dimensional continuous optimization problem. In this work, we present a framework that transforms design automation into a low-dimensional continuous optimization problem by training a graph variational autoencoder (VAE) to learn a mapping between the graph-structured design space and a continuous latent space.
arXiv Detail & Related papers (2022-09-23T17:48:24Z)
Active Predicting Coding: Brain-Inspired Reinforcement Learning for Sparse Reward Robotic Control Problems [79.07468367923619]
We propose a backpropagation-free approach to robotic control through the neuro-cognitive computational framework of neural generative coding (NGC) We design an agent built completely from powerful predictive coding/processing circuits that facilitate dynamic, online learning from sparse rewards. We show that our proposed ActPC agent performs well in the face of sparse (extrinsic) reward signals and is competitive with or outperforms several powerful backprop-based RL approaches.
arXiv Detail & Related papers (2022-09-19T16:49:32Z)
Accelerating Robotic Reinforcement Learning via Parameterized Action Primitives [92.0321404272942]
Reinforcement learning can be used to build general-purpose robotic systems. However, training RL agents to solve robotics tasks still remains challenging. In this work, we manually specify a library of robot action primitives (RAPS), parameterized with arguments that are learned by an RL policy. We find that our simple change to the action interface substantially improves both the learning efficiency and task performance.
arXiv Detail & Related papers (2021-10-28T17:59:30Z)
Bioinspired Bipedal Locomotion Control for Humanoid Robotics Based on EACO [1.0152838128195467]
This work presents promoting global search capability and convergence rate of the EACO applied to humanoid robots in real-time. We put a special focus on the EACO algorithm on a wide range of problems, from ACO, real-coded GAs, GAs with neural networks(NNs), particle swarm optimization(PSO) to complex robotics systems.
arXiv Detail & Related papers (2020-10-09T09:43:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.