Pretraining in Actor-Critic Reinforcement Learning for Robot Motion Control
- URL: http://arxiv.org/abs/2510.12363v1
- Date: Tue, 14 Oct 2025 10:25:40 GMT
- Title: Pretraining in Actor-Critic Reinforcement Learning for Robot Motion Control
- Authors: Jiale Fan, Andrei Cramariuc, Tifanny Portela, Marco Hutter,
- Abstract summary: This work aims to define a paradigm for pretraining neural network models.<n>We use a task-agnostic exploration-based data collection algorithm to gather diverse, dynamic transition data.<n>The pretrained weights are loaded into both the actor and critic networks to warm-start the policy optimization of actual tasks.
- Score: 6.288719574558261
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The pretraining-finetuning paradigm has facilitated numerous transformative advancements in artificial intelligence research in recent years. However, in the domain of reinforcement learning (RL) for robot motion control, individual skills are often learned from scratch despite the high likelihood that some generalizable knowledge is shared across all task-specific policies belonging to a single robot embodiment. This work aims to define a paradigm for pretraining neural network models that encapsulate such knowledge and can subsequently serve as a basis for warm-starting the RL process in classic actor-critic algorithms, such as Proximal Policy Optimization (PPO). We begin with a task-agnostic exploration-based data collection algorithm to gather diverse, dynamic transition data, which is then used to train a Proprioceptive Inverse Dynamics Model (PIDM) through supervised learning. The pretrained weights are loaded into both the actor and critic networks to warm-start the policy optimization of actual tasks. We systematically validated our proposed method on seven distinct robot motion control tasks, showing significant benefits to this initialization strategy. Our proposed approach on average improves sample efficiency by 40.1% and task performance by 7.5%, compared to random initialization. We further present key ablation studies and empirical analyses that shed light on the mechanisms behind the effectiveness of our method.
Related papers
- A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control [21.22244612145334]
Diffusion policies have emerged as a powerful approach for robotic control.<n>Online Diffusion Policy Reinforcement Learning (Online DPRL) algorithms for scalable robotic control systems are studied.
arXiv Detail & Related papers (2026-01-05T05:19:23Z) - What Matters for Batch Online Reinforcement Learning in Robotics? [65.06558240091758]
The ability to learn from large batches of autonomously collected data for policy improvement holds the promise of enabling truly scalable robot learning.<n>Previous works have applied imitation learning and filtered imitation learning methods to the batch online RL problem.<n>We analyze how these axes affect performance and scaling with the amount of autonomous data.
arXiv Detail & Related papers (2025-05-12T21:24:22Z) - Next-Future: Sample-Efficient Policy Learning for Robotic-Arm Tasks [6.991281327290525]
We introduce a novel replay strategy, "Next-Future", which focuses on rewarding single-step transitions.<n>This approach significantly enhances sample efficiency and accuracy in learning multi-goal Markov decision processes.
arXiv Detail & Related papers (2025-04-15T14:45:51Z) - Coarse-to-fine Q-Network with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel value-based reinforcement learning algorithm that learns a critic network that outputs Q-values over a sequence of actions.<n>Experiments show that CQN-AS outperforms several baselines on a variety of sparse-reward humanoid control and tabletop manipulation tasks.
arXiv Detail & Related papers (2024-11-19T01:23:52Z) - Active Exploration in Bayesian Model-based Reinforcement Learning for Robot Manipulation [8.940998315746684]
We propose a model-based reinforcement learning (RL) approach for robotic arm end-tasks.
We employ Bayesian neural network models to represent, in a probabilistic way, both the belief and information encoded in the dynamic model during exploration.
Our experiments show the advantages of our Bayesian model-based RL approach, with similar quality in the results than relevant alternatives.
arXiv Detail & Related papers (2024-04-02T11:44:37Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [65.57123249246358]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT.<n>On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt.<n>On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z) - PASTA: Pretrained Action-State Transformer Agents [10.654719072766495]
Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains.
Recent approaches involve pre-training transformer models on vast amounts of unlabeled data.
In reinforcement learning, researchers have recently adapted these approaches, developing models pre-trained on expert trajectories.
arXiv Detail & Related papers (2023-07-20T15:09:06Z) - Don't Start From Scratch: Leveraging Prior Data to Automate Robotic
Reinforcement Learning [70.70104870417784]
Reinforcement learning (RL) algorithms hold the promise of enabling autonomous skill acquisition for robotic systems.
In practice, real-world robotic RL typically requires time consuming data collection and frequent human intervention to reset the environment.
In this work, we study how these challenges can be tackled by effective utilization of diverse offline datasets collected from previously seen tasks.
arXiv Detail & Related papers (2022-07-11T08:31:22Z) - Training and Evaluation of Deep Policies using Reinforcement Learning
and Generative Models [67.78935378952146]
GenRL is a framework for solving sequential decision-making problems.
It exploits the combination of reinforcement learning and latent variable generative models.
We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training.
arXiv Detail & Related papers (2022-04-18T22:02:32Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.