Related papers: RT-HCP: Dealing with Inference Delays and Sample Efficiency to Learn Directly on Robotic Platforms

RT-HCP: Dealing with Inference Delays and Sample Efficiency to Learn Directly on Robotic Platforms

URL: http://arxiv.org/abs/2509.06714v1
Date: Mon, 08 Sep 2025 14:09:33 GMT
Title: RT-HCP: Dealing with Inference Delays and Sample Efficiency to Learn Directly on Robotic Platforms
Authors: Zakariae El Asri, Ibrahim Laiche, Clément Rambour, Olivier Sigaud, Nicolas Thome,
Abstract summary: Learning a controller directly on the robot requires extreme sample efficiency.<n>We propose RT-HCP, an algorithm that offers an excellent trade-off between performance, sample efficiency and inference time.<n>We validate the superiority of RT-HCP with experiments where we learn a controller directly on a simple but high frequency pendulum platform.
Score: 16.18687520299694
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning a controller directly on the robot requires extreme sample efficiency. Model-based reinforcement learning (RL) methods are the most sample efficient, but they often suffer from a too long inference time to meet the robot control frequency requirements. In this paper, we address the sample efficiency and inference time challenges with two contributions. First, we define a general framework to deal with inference delays where the slow inference robot controller provides a sequence of actions to feed the control-hungry robotic platform without execution gaps. Then, we compare several RL algorithms in the light of this framework and propose RT-HCP, an algorithm that offers an excellent trade-off between performance, sample efficiency and inference time. We validate the superiority of RT-HCP with experiments where we learn a controller directly on a simple but high frequency FURUTA pendulum platform. Code: github.com/elasriz/RTHCP

Related papers

TARC: Time-Adaptive Robotic Control [48.61871569444481]
Fixed-frequency control in robotics imposes a trade-off between the efficiency of low-frequency control and the robustness of high-frequency control.<n>We address this with a reinforcement learning approach in which policies jointly select control actions and their application durations.<n>We validate our method with zero-shot sim-to-real experiments on two distinct hardware platforms.
arXiv Detail & Related papers (2025-10-27T10:10:19Z)
Assistax: A Hardware-Accelerated Reinforcement Learning Benchmark for Assistive Robotics [18.70896736010314]
Games have dominated reinforcement learning benchmarks because they present relevant challenges, are inexpensive to run and easy to understand.<n>We introduce Assistax: an open-source benchmark designed to address challenges arising in assistive robotics tasks.<n>In terms of open-loop wall-clock time, Assistax runs up to $370times$ faster when vectorising training runs compared to CPU-based alternatives.
arXiv Detail & Related papers (2025-07-29T09:49:11Z)
FAST: Efficient Action Tokenization for Vision-Language-Action Models [98.15494168962563]
We propose a new compression-based tokenization scheme for robot actions, based on the discrete cosine transform.<n>Based on FAST, we release FAST+, a universal robot action tokenizer, trained on 1M real robot action trajectories.
arXiv Detail & Related papers (2025-01-16T18:57:04Z)
One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation [80.71541671907426]
OneStep Diffusion Policy (OneDP) is a novel approach that distills knowledge from pre-trained diffusion policies into a single-step action generator. OneDP significantly accelerates response times for robotic control tasks.
arXiv Detail & Related papers (2024-10-28T17:54:31Z)
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning [82.46975428739329]
We develop a library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment.<n>We find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation.<n>These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent robustness recovery and correction behaviors.
arXiv Detail & Related papers (2024-01-29T10:01:10Z)
Modelling, Positioning, and Deep Reinforcement Learning Path Tracking Control of Scaled Robotic Vehicles: Design and Experimental Validation [3.807917169053206]
Scaled robotic cars are commonly equipped with a hierarchical control acthiecture that includes tasks dedicated to vehicle state estimation and control. This paper covers both aspects by proposing (i) a federeted extended Kalman filter (FEKF) and (ii) a novel deep reinforcement learning (DRL) path tracking controller trained via an expert demonstrator. The experimentally validated model is used for (i) supporting the design of the FEKF and (ii) serving as a digital twin for training the proposed DRL-based path tracking algorithm.
arXiv Detail & Related papers (2024-01-10T14:40:53Z)
Tuning Legged Locomotion Controllers via Safe Bayesian Optimization [47.87675010450171]
This paper presents a data-driven strategy to streamline the deployment of model-based controllers in legged robotic hardware platforms. We leverage a model-free safe learning algorithm to automate the tuning of control gains, addressing the mismatch between the simplified model used in the control formulation and the real system.
arXiv Detail & Related papers (2023-06-12T13:10:14Z)
Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration [68.94506047556412]
We propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration. We show that DCIL-II can solve with unprecedented sample efficiency some challenging simulated tasks such as humanoid locomotion and stand-up.
arXiv Detail & Related papers (2022-11-09T10:28:40Z)
Training Efficient Controllers via Analytic Policy Gradient [44.0762454494769]
Control design for robotic systems is complex and often requires solving an optimization to follow a trajectory accurately. Online optimization approaches like Model Predictive Control (MPC) have been shown to achieve great tracking performance, but require high computing power. We propose an Analytic Policy Gradient (APG) method to tackle this problem.
arXiv Detail & Related papers (2022-09-26T22:04:35Z)
An Efficiency Study for SPLADE Models [5.725475501578801]
In this paper, we focus on improving the efficiency of the SPLADE model. We propose several techniques including L1 regularization for queries, a separation of document/ encoders, a FLOPS-regularized middle-training, and the use of faster query encoders.
arXiv Detail & Related papers (2022-07-08T11:42:05Z)
Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms. We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.