SpeedAug: Policy Acceleration via Tempo-Enriched Policy and RL Fine-Tuning
- URL: http://arxiv.org/abs/2512.00062v1
- Date: Mon, 24 Nov 2025 04:25:47 GMT
- Title: SpeedAug: Policy Acceleration via Tempo-Enriched Policy and RL Fine-Tuning
- Authors: Taewook Nam, Sung Ju Hwang,
- Abstract summary: Reinforcement learning is a promising approach that adapts policies for faster execution without additional demonstration.<n>We propose SpeedAug, an RL-based policy acceleration framework that efficiently adapts pre-trained policies for faster task execution.
- Score: 52.29534291796025
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in robotic policy learning have enabled complex manipulation in real-world environments, yet the execution speed of these policies often lags behind hardware capabilities due to the cost of collecting faster demonstrations. Existing works on policy acceleration reinterpret action sequence for unseen execution speed, thereby encountering distributional shifts from the original demonstrations. Reinforcement learning is a promising approach that adapts policies for faster execution without additional demonstration, but its unguided exploration is sample inefficient. We propose SpeedAug, an RL-based policy acceleration framework that efficiently adapts pre-trained policies for faster task execution. SpeedAug constructs behavior prior that encompasses diverse tempos of task execution by pre-training a policy on speed-augmented demonstrations. Empirical results on robotic manipulation benchmarks show that RL fine-tuning initialized from this tempo-enriched policy significantly improves the sample efficiency of existing RL and policy acceleration methods while maintaining high success rate.
Related papers
- Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation [65.13627721310613]
Mean velocity policy (MVP) is a new generative policy function that models the mean velocity field to achieve the fastest one-step action generation.<n>MVP achieves state-of-the-art success rates across several challenging robotic manipulation tasks from Robomimic and OGBench.
arXiv Detail & Related papers (2026-02-14T14:44:06Z) - Improving Policy Exploitation in Online Reinforcement Learning with Instant Retrospect Action [10.983482150597913]
Existing value-based online reinforcement learning (RL) algorithms suffer from slow policy exploitation due to ineffective exploration and delayed policy updates.<n>We propose an algorithm called Instant Retrospect Action (IRA) to address these challenges.<n>IRA can significantly improve the learning efficiency and final performance of online RL algorithms on eight MuJoCo continuous control tasks.
arXiv Detail & Related papers (2026-01-27T15:43:02Z) - Coverage Improvement and Fast Convergence of On-policy Preference Learning [67.36750525893514]
Online on-policy preference learning algorithms for language model alignment can significantly outperform their offline counterparts.<n>We analyze how the sampling policy's coverage evolves throughout on-policy training.<n>We develop principled on-policy schemes for reward distillation in the general function class setting.
arXiv Detail & Related papers (2026-01-13T10:46:06Z) - Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning [87.81738284453013]
We first show theoretically that standard behavioral cloning (BC) can fail to ensure coverage over the demonstrator's actions.<n>We then show that if, instead of exactly fitting the observed demonstrations, we train a policy to model the posterior distribution of the demonstrator's behavior.<n>This policy ensures coverage over the demonstrator's actions, enabling more effective finetuning.
arXiv Detail & Related papers (2025-12-18T18:59:17Z) - Succeed or Learn Slowly: Sample Efficient Off-Policy Reinforcement Learning for Mobile App Control [50.316067647636196]
This paper introduces Succeed or Learn Slowly (SoLS), a novel off-policyReinforcement learning algorithm evaluated on mobile app control tasks.<n>SoLS improves sample efficiency when fine-tuning foundation models for user interface navigation via a modified off-policy actor-critic approach.<n>We augment SoLS with Successful Transition Replay (STR), which prioritises learning from successful interactions.
arXiv Detail & Related papers (2025-09-01T18:55:27Z) - Steering Your Diffusion Policy with Latent Space Reinforcement Learning [46.598122553180005]
Behavioral cloning (BC)-learned policies typically require collecting additional human demonstrations to further improve their behavior.<n> reinforcement learning (RL) holds the promise of enabling autonomous online policy improvement, but often falls short of achieving this due to the large number of samples it typically requires.<n>We show that DSRL is highly sample efficient, requires only black-box access to the BC policy, and enables effective real-world autonomous policy improvement.
arXiv Detail & Related papers (2025-06-18T18:35:57Z) - SAIL: Faster-than-Demonstration Execution of Imitation Learning Policies [20.52085846080824]
offline Imitation Learning (IL) methods are effective at acquiring complex robotic manipulation skills.<n>Existing IL-trained policies are confined to executing the task at the same speed as shown in demonstration data.<n>We introduce and formalize the novel problem of enabling faster-than-demonstration execution of visuomotor policies.
arXiv Detail & Related papers (2025-06-13T16:58:20Z) - Fast Adaptation with Behavioral Foundation Models [82.34700481726951]
Unsupervised zero-shot reinforcement learning has emerged as a powerful paradigm for pretraining behavioral foundation models.<n>Despite promising results, zero-shot policies are often suboptimal due to errors induced by the unsupervised training process.<n>We propose fast adaptation strategies that search in the low-dimensional task-embedding space of the pre-trained BFM to rapidly improve the performance of its zero-shot policies.
arXiv Detail & Related papers (2025-04-10T16:14:17Z) - Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone [72.17534881026995]
We develop an offline and online fine-tuning approach called policy-agnostic RL (PA-RL)<n>We show the first result that successfully fine-tunes OpenVLA, a 7B generalist robot policy, autonomously with Cal-QL, an online RL fine-tuning algorithm.
arXiv Detail & Related papers (2024-12-09T17:28:03Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - GPU-Accelerated Policy Optimization via Batch Automatic Differentiation
of Gaussian Processes for Real-World Control [8.720903734757627]
We develop a policy optimization method by leveraging fast predictive sampling methods to process batches of trajectories in every forward pass.
We demonstrate the effectiveness of our approach in training policies on a set of reference-tracking control experiments with a heavy-duty machine.
arXiv Detail & Related papers (2022-02-28T09:31:15Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.