A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control
- URL: http://arxiv.org/abs/2601.06133v1
- Date: Mon, 05 Jan 2026 05:19:23 GMT
- Title: A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control
- Authors: Wonhyeok Choi, Minwoo Choi, Jungwan Woo, Kyumin Hwang, Jaeyeul Kim, Sunghoon Im,
- Abstract summary: Diffusion policies have emerged as a powerful approach for robotic control.<n>Online Diffusion Policy Reinforcement Learning (Online DPRL) algorithms for scalable robotic control systems are studied.
- Score: 21.22244612145334
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Diffusion policies have emerged as a powerful approach for robotic control, demonstrating superior expressiveness in modeling multimodal action distributions compared to conventional policy networks. However, their integration with online reinforcement learning remains challenging due to fundamental incompatibilities between diffusion model training objectives and standard RL policy improvement mechanisms. This paper presents the first comprehensive review and empirical analysis of current Online Diffusion Policy Reinforcement Learning (Online DPRL) algorithms for scalable robotic control systems. We propose a novel taxonomy that categorizes existing approaches into four distinct families -- Action-Gradient, Q-Weighting, Proximity-Based, and Backpropagation Through Time (BPTT) methods -- based on their policy improvement mechanisms. Through extensive experiments on a unified NVIDIA Isaac Lab benchmark encompassing 12 diverse robotic tasks, we systematically evaluate representative algorithms across five critical dimensions: task diversity, parallelization capability, diffusion step scalability, cross-embodiment generalization, and environmental robustness. Our analysis identifies key findings regarding the fundamental trade-offs inherent in each algorithmic family, particularly concerning sample efficiency and scalability. Furthermore, we reveal critical computational and algorithmic bottlenecks that currently limit the practical deployment of online DPRL. Based on these findings, we provide concrete guidelines for algorithm selection tailored to specific operational constraints and outline promising future research directions to advance the field toward more general and scalable robotic learning systems.
Related papers
- Taxonomy and Trends in Reinforcement Learning for Robotics and Control Systems: A Structured Review [2.064612766965483]
This work presents an in-depth review of RL principles, advanced deep reinforcement learning (DRL) algorithms, and their integration into robotic and control systems.<n> Emphasis is placed on modern DRL techniques such as DDPG, TD3, PPO, and SAC, which have shown promise in solving high-dimensional, continuous control tasks.<n>The review synthesizes recent research efforts, highlighting technical trends, design patterns, and the growing maturity of RL in real-world robotics.
arXiv Detail & Related papers (2025-10-11T20:16:32Z) - Learning Robust Penetration-Testing Policies under Partial Observability: A systematic evaluation [0.28675177318965045]
Penetration testing, the simulation of cyberattacks to identify security vulnerabilities, presents a sequential decision-making problem.<n>Partial observability invalidates the Markov property present in Markov Decision Processes.<n>We investigate, partially observable penetration testing scenarios over host networks of varying size, aiming to better reflect real-world complexity.
arXiv Detail & Related papers (2025-09-24T11:27:54Z) - Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning [53.85659415230589]
This paper systematically reviews widely adoptedReinforcement learning techniques.<n>We present clear guidelines for selecting RL techniques tailored to specific setups.<n>We also reveal that a minimalist combination of two techniques can unlock the learning capability of critic-free policies.
arXiv Detail & Related papers (2025-08-11T17:39:45Z) - The Emergence of Deep Reinforcement Learning for Path Planning [27.08547928141541]
Deep reinforcement learning (DRL) has emerged as a powerful method for enabling autonomous agents to learn optimal navigation strategies.<n>This survey provides a comprehensive overview of traditional approaches as well as the recent advancements in DRL applied to path planning tasks.<n>The survey concludes by identifying key open challenges and outlining promising avenues for future research.
arXiv Detail & Related papers (2025-07-21T10:21:42Z) - What Matters for Batch Online Reinforcement Learning in Robotics? [65.06558240091758]
The ability to learn from large batches of autonomously collected data for policy improvement holds the promise of enabling truly scalable robot learning.<n>Previous works have applied imitation learning and filtered imitation learning methods to the batch online RL problem.<n>We analyze how these axes affect performance and scaling with the amount of autonomous data.
arXiv Detail & Related papers (2025-05-12T21:24:22Z) - Synthesis of Model Predictive Control and Reinforcement Learning: Survey and Classification [5.260523686933724]
This work illuminates the differences, similarities, and fundamentals that allow for different combination algorithms.<n>We focus on the versatile actor-critic RL approach as a basis for our categorization and examine how the online optimization approach of MPC can be used to improve the overall closed-loop performance of a policy.
arXiv Detail & Related papers (2025-02-04T09:06:07Z) - Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [86.99017195607077]
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a wireless network with statistically-identical agents.<n>Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies.
arXiv Detail & Related papers (2024-04-04T06:24:11Z) - Distilling Reinforcement Learning Policies for Interpretable Robot Locomotion: Gradient Boosting Machines and Symbolic Regression [53.33734159983431]
This paper introduces a novel approach to distill neural RL policies into more interpretable forms.
We train expert neural network policies using RL and distill them into (i) GBMs, (ii) EBMs, and (iii) symbolic policies.
arXiv Detail & Related papers (2024-03-21T11:54:45Z) - Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint [56.74058752955209]
This paper studies the alignment process of generative models with Reinforcement Learning from Human Feedback (RLHF)
We first identify the primary challenges of existing popular methods like offline PPO and offline DPO as lacking in strategical exploration of the environment.
We propose efficient algorithms with finite-sample theoretical guarantees.
arXiv Detail & Related papers (2023-12-18T18:58:42Z) - Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for
Robotics Control with Action Constraints [9.293472255463454]
This study presents a benchmark for evaluating action-constrained reinforcement learning (RL) algorithms.
We evaluate existing algorithms and their novel variants across multiple robotics control environments.
arXiv Detail & Related papers (2023-04-18T05:45:09Z) - Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search.
We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.