Related papers: Streaming Diffusion Policy: Fast Policy Synthesis with Variable Noise Diffusion Models

Streaming Diffusion Policy: Fast Policy Synthesis with Variable Noise Diffusion Models

URL: http://arxiv.org/abs/2406.04806v4
Date: Fri, 11 Oct 2024 16:04:49 GMT
Title: Streaming Diffusion Policy: Fast Policy Synthesis with Variable Noise Diffusion Models
Authors: Sigmund H. Høeg, Yilun Du, Olav Egeland,
Abstract summary: Diffusion models have seen rapid adoption in robotic imitation learning, enabling autonomous execution of complex tasks. Recent works have explored how the distillation of the diffusion process can be used to accelerate policy synthesis. We propose SDP (Streaming Diffusion Policy), an alternative method to accelerate policy synthesis.
Score: 24.34842113104745
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion models have seen rapid adoption in robotic imitation learning, enabling autonomous execution of complex dexterous tasks. However, action synthesis is often slow, requiring many steps of iterative denoising, limiting the extent to which models can be used in tasks that require fast reactive policies. To sidestep this, recent works have explored how the distillation of the diffusion process can be used to accelerate policy synthesis. However, distillation is computationally expensive and can hurt both the accuracy and diversity of synthesized actions. We propose SDP (Streaming Diffusion Policy), an alternative method to accelerate policy synthesis, leveraging the insight that generating a partially denoised action trajectory is substantially faster than a full output action trajectory. At each observation, our approach outputs a partially denoised action trajectory with variable levels of noise corruption, where the immediate action to execute is noise-free, with subsequent actions having increasing levels of noise and uncertainty. The partially denoised action trajectory for a new observation can then be quickly generated by applying a few steps of denoising to the previously predicted noisy action trajectory (rolled over by one timestep). We illustrate the efficacy of this approach, dramatically speeding up policy synthesis while preserving performance across both simulated and real-world settings.

Related papers

Primary-Fine Decoupling for Action Generation in Robotic Imitation [91.2899765310853]
Multi-modal distribution in robotic manipulation action sequences poses critical challenges for imitation learning.<n>We propose Primary-Fine Decoupling for Action Generation (PF-DAG), a two-stage framework that decouples coarse action consistency from fine-grained variations.<n>PF-DAG outperforms state-of-the-art baselines across 56 tasks from Adroit, DexArt, and MetaWorld benchmarks.
arXiv Detail & Related papers (2026-02-25T08:36:45Z)
STEP: Warm-Started Visuomotor Policies with Spatiotemporal Consistency Prediction [16.465783114087223]
iterative denoising leads to substantial inference latency, limiting control frequency in real-time closed-loop systems.<n>We propose STEP, a lightweighttemporal consistency prediction mechanism to construct high-quality warm-start actions.<n> STEP with 2 steps can achieve an average 21.6% and 27.5% higher success rate than BRIDGER and DDIM on the RoboMimic benchmark and real-world tasks.
arXiv Detail & Related papers (2026-02-09T03:50:40Z)
Action-to-Action Flow Matching [25.301629044539325]
Diffusion-based policies have recently achieved remarkable success in robotics by formulating action prediction as a conditional denoising process.<n>We propose Action-to-Action flow matching (A2A), a novel policy paradigm that shifts from random sampling to initialization informed by the previous action.<n>A2A enables high-quality action generation in as few as a single inference step (0.56 ms latency), and exhibits superior robustness to visual perturbations and enhanced generalization to unseen configurations.
arXiv Detail & Related papers (2026-02-07T02:39:49Z)
Characterizing Motion Encoding in Video Diffusion Timesteps [50.13907856401258]
We study how motion is encoded in video diffusion timesteps by the trade-off between appearance editing and motion preservation.<n>We identify an early, motion-dominant regime and a later, appearance-dominant regime, yielding an operational motion-appearance boundary in timestep space.
arXiv Detail & Related papers (2025-12-18T21:20:54Z)
Theoretical Closed-loop Stability Bounds for Dynamical System Coupled with Diffusion Policies [39.499082381148035]
This work studies the possibility of conducting the denoising process only partially before executing an action, allowing the plant to evolve according to its dynamics in parallel to the reverse-time diffusion dynamics ongoing on the computer.<n>The contribution of this work gives a framework for faster imitation learning and a metric that yields if a controller will be stable based on the variance of the demonstrations.
arXiv Detail & Related papers (2025-11-19T15:13:08Z)
Real-Time Iteration Scheme for Diffusion Policy [23.124189676943757]
We introduce a novel approach inspired by the Real-Time Iteration (RTI) Scheme to accelerate inference.<n>We propose a scaling-based method to effectively handle discrete actions, such as grasping, in robotic manipulation.<n>The proposed scheme significantly reduces runtime computational costs without the need for distillation or policy redesign.
arXiv Detail & Related papers (2025-08-07T13:49:00Z)
ActionSink: Toward Precise Robot Manipulation with Dynamic Integration of Action Flow [93.00917887667234]
This paper introduces a novel robot manipulation framework, i.e., ActionSink, to pave the way toward precise action estimations.<n>As the name suggests, ActionSink reformulates the actions of robots as action-caused optical flows from videos, called "action flow"<n>Our framework outperformed prior SOTA on the LIBERO benchmark by a 7.9% success rate, and obtained nearly an 8% accuracy gain on the challenging long-horizon visual task LIBERO-Long.
arXiv Detail & Related papers (2025-08-05T08:46:17Z)
Time-Unified Diffusion Policy with Action Discrimination for Robotic Manipulation [19.449168375853347]
We present Time-Unified Diffusion Policy (TUDP), which utilizes action recognition capabilities to build a time-unified denoising process.<n>Our method achieves state-of-the-art performance on RLBench with the highest success rate of 82.6% on a multi-view setup and 83.8% on a single-view setup.
arXiv Detail & Related papers (2025-06-11T06:11:49Z)
Streaming Generation of Co-Speech Gestures via Accelerated Rolling Diffusion [0.881371061335494]
We introduce Accelerated Rolling Diffusion, a novel framework for streaming gesture generation. RDLA restructures the noise schedule into a stepwise ladder, allowing multiple frames to be denoised simultaneously. This significantly improves sampling efficiency while maintaining motion consistency, achieving up to a 2x speedup.
arXiv Detail & Related papers (2025-03-13T15:54:45Z)
Divide and Conquer: Heterogeneous Noise Integration for Diffusion-based Adversarial Purification [75.09791002021947]
Existing purification methods aim to disrupt adversarial perturbations by introducing a certain amount of noise through a forward diffusion process, followed by a reverse process to recover clean examples. This approach is fundamentally flawed as the uniform operation of the forward process compromises normal pixels while attempting to combat adversarial perturbations. We propose a heterogeneous purification strategy grounded in the interpretability of neural networks. Our method decisively applies higher-intensity noise to specific pixels that the target model focuses on while the remaining pixels are subjected to only low-intensity noise.
arXiv Detail & Related papers (2025-03-03T11:00:25Z)
CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction [28.761494362934087]
Coarse-to-Fine AutoRegressive Policy (CARP) is a novel paradigm for visuomotor policy learning. It redefines the autoregressive action generation process as a coarse-to-fine, next-scale approach. CARP achieves competitive success rates, with up to a 10% improvement, and delivers 10x faster inference compared to state-of-the-art policies.
arXiv Detail & Related papers (2024-12-09T18:59:18Z)
Diffusion Implicit Policy for Unpaired Scene-aware Motion Synthesis [48.65197562914734]
We propose a unified framework, termed Diffusion Implicit Policy (DIP), for scene-aware motion synthesis. In this framework, we disentangle human-scene interaction from motion synthesis during training. We show that our framework presents better motion naturalness and interaction plausibility than cutting-edge methods.
arXiv Detail & Related papers (2024-12-03T08:34:41Z)
One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation [80.71541671907426]
OneStep Diffusion Policy (OneDP) is a novel approach that distills knowledge from pre-trained diffusion policies into a single-step action generator. OneDP significantly accelerates response times for robotic control tasks.
arXiv Detail & Related papers (2024-10-28T17:54:31Z)
Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy [44.09909260046396]
We propose AdaptiveDiffusion to reduce noise prediction steps during the denoising process. Our method can significantly speed up the denoising process while generating identical results to the original process, achieving up to an average 25x speedup.
arXiv Detail & Related papers (2024-10-13T15:19:18Z)
RecMoDiffuse: Recurrent Flow Diffusion for Human Motion Generation [5.535590461577558]
RecMoDiffuse is a new recurrent diffusion formulation for temporal modelling. We demonstrate the effectiveness of RecMoDiffuse in the temporal modelling of human motion.
arXiv Detail & Related papers (2024-06-11T11:25:37Z)
SinSR: Diffusion-Based Image Super-Resolution in a Single Step [119.18813219518042]
Super-resolution (SR) methods based on diffusion models exhibit promising results. But their practical application is hindered by the substantial number of required inference steps. We propose a simple yet effective method for achieving single-step SR generation, named SinSR.
arXiv Detail & Related papers (2023-11-23T16:21:29Z)
Towards More Accurate Diffusion Model Acceleration with A Timestep Aligner [84.97253871387028]
A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed. We propose a timestep aligner that helps find a more accurate integral direction for a particular interval at the minimum cost. Experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods.
arXiv Detail & Related papers (2023-10-14T02:19:07Z)
Synthesizing Long-Term Human Motions with Diffusion Models via Coherent Sampling [74.62570964142063]
Text-to-motion generation has gained increasing attention, but most existing methods are limited to generating short-term motions. We propose a novel approach that utilizes a past-conditioned diffusion model with two optional coherent sampling methods. Our proposed method is capable of generating compositional and coherent long-term 3D human motions controlled by a user-instructed long text stream.
arXiv Detail & Related papers (2023-08-03T16:18:32Z)
TEDi: Temporally-Entangled Diffusion for Long-Term Motion Synthesis [27.23431793291876]
We propose to adapt the gradual diffusion concept into the temporal-axis of the motion sequence. Our key idea is to extend the DDPM framework to support temporally varying denoising, thereby entangling the two axes. This new mechanism paves the way towards a new framework for long-term motion synthesis with applications to character animation and other domains.
arXiv Detail & Related papers (2023-07-27T17:48:44Z)
DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion [137.8749239614528]
We propose a new formulation of temporal action detection (TAD) with denoising diffusion, DiffTAD. Taking as input random temporal proposals, it can yield action proposals accurately given an untrimmed long video.
arXiv Detail & Related papers (2023-03-27T00:40:52Z)
Extracting Quantum Dynamical Resources: Consumption of Non-Markovianity for Noise Reduction [0.0]
We show that the key resource responsible for noise suppression is non-Markovianity (or temporal correlations) We propose two methods to identify optimal pulse sequences for noise reduction. The corresponding tools are built on operational grounds and are easily implemented in the current generation of quantum devices.
arXiv Detail & Related papers (2021-10-06T09:31:34Z)
Noise Estimation for Generative Diffusion Models [91.22679787578438]
In this work, we present a simple and versatile learning scheme that can adjust the noise parameters for any given number of steps. Our approach comes at a negligible computation cost.
arXiv Detail & Related papers (2021-04-06T15:46:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.