Score and Distribution Matching Policy: Advanced Accelerated Visuomotor Policies via Matched Distillation
- URL: http://arxiv.org/abs/2412.09265v4
- Date: Thu, 19 Dec 2024 12:11:13 GMT
- Title: Score and Distribution Matching Policy: Advanced Accelerated Visuomotor Policies via Matched Distillation
- Authors: Bofang Jia, Pengxiang Ding, Can Cui, Mingyang Sun, Pengfang Qian, Siteng Huang, Zhaoxin Fan, Donglin Wang,
- Abstract summary: We propose the Score and Distribution Matching Policy (SDM Policy) for visual-motor policy learning.<n>SDM Policy transforms diffusion-based policies into single-step generators through a two-stage optimization process.<n>It achieves a 6x inference speedup while having state-of-the-art action quality.
- Score: 29.90613565503628
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual-motor policy learning has advanced with architectures like diffusion-based policies, known for modeling complex robotic trajectories. However, their prolonged inference times hinder high-frequency control tasks requiring real-time feedback. While consistency distillation (CD) accelerates inference, it introduces errors that compromise action quality. To address these limitations, we propose the Score and Distribution Matching Policy (SDM Policy), which transforms diffusion-based policies into single-step generators through a two-stage optimization process: score matching ensures alignment with true action distributions, and distribution matching minimizes KL divergence for consistency. A dual-teacher mechanism integrates a frozen teacher for stability and an unfrozen teacher for adversarial training, enhancing robustness and alignment with target distributions. Evaluated on a 57-task simulation benchmark, SDM Policy achieves a 6x inference speedup while having state-of-the-art action quality, providing an efficient and reliable framework for high-frequency robotic tasks.
Related papers
- Fast Adaptation with Behavioral Foundation Models [82.34700481726951]
Unsupervised zero-shot reinforcement learning has emerged as a powerful paradigm for pretraining behavioral foundation models.
Despite promising results, zero-shot policies are often suboptimal due to errors induced by the unsupervised training process.
We propose fast adaptation strategies that search in the low-dimensional task-embedding space of the pre-trained BFM to rapidly improve the performance of its zero-shot policies.
arXiv Detail & Related papers (2025-04-10T16:14:17Z) - Real-Time Diffusion Policies for Games: Enhancing Consistency Policies with Q-Ensembles [43.11094512130205]
We present CPQE (Consistency Policy with Q-ensembles), which combines consistency models with Q-ensembles to address policy learning challenges.
CPQE achieves inference speeds of up to 60 Hz, a significant improvement over state-of-the-art diffusion policies that operate at only 20 Hz.
These results indicate that CPQE offers a practical solution for deploying diffusion-based policies in games and other real-time applications.
arXiv Detail & Related papers (2025-03-21T09:45:59Z) - Fast and Robust Visuomotor Riemannian Flow Matching Policy [15.341017260123927]
Diffusion-based visuomotor policies excel at learning complex robotic tasks.
RFMP is a model that inherits the easy training and fast inference capabilities of flow matching.
arXiv Detail & Related papers (2024-12-14T15:03:33Z) - CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction [28.761494362934087]
Coarse-to-Fine AutoRegressive Policy (CARP) is a novel paradigm for visuomotor policy learning.
It redefines the autoregressive action generation process as a coarse-to-fine, next-scale approach.
CARP achieves competitive success rates, with up to a 10% improvement, and delivers 10x faster inference compared to state-of-the-art policies.
arXiv Detail & Related papers (2024-12-09T18:59:18Z) - One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation [80.71541671907426]
OneStep Diffusion Policy (OneDP) is a novel approach that distills knowledge from pre-trained diffusion policies into a single-step action generator.
OneDP significantly accelerates response times for robotic control tasks.
arXiv Detail & Related papers (2024-10-28T17:54:31Z) - Diffusion Policies creating a Trust Region for Offline Reinforcement Learning [66.17291150498276]
We introduce a dual policy approach, Diffusion Trusted Q-Learning (DTQL), which comprises a diffusion policy for pure behavior cloning and a practical one-step policy.
DTQL eliminates the need for iterative denoising sampling during both training and inference, making it remarkably computationally efficient.
We show that DTQL could not only outperform other methods on the majority of the D4RL benchmark tasks but also demonstrate efficiency in training and inference speeds.
arXiv Detail & Related papers (2024-05-30T05:04:33Z) - Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation [31.534668378308822]
Consistency Policy is a faster and similarly powerful alternative to Diffusion Policy for learning visuomotor robot control.
By virtue of its fast inference speed, Consistency Policy can enable low latency decision making in resource-constrained robotic setups.
Key design decisions that enabled this performance are the choice of consistency objective, reduced initial sample variance, and the choice of preset chaining steps.
arXiv Detail & Related papers (2024-05-13T06:53:42Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Expeditious Saliency-guided Mix-up through Random Gradient Thresholding [89.59134648542042]
Mix-up training approaches have proven to be effective in improving the generalization ability of Deep Neural Networks.
In this paper, inspired by the superior qualities of each direction over one another, we introduce a novel method that lies at the junction of the two routes.
We name our method R-Mix following the concept of "Random Mix-up"
In order to address the question of whether there exists a better decision protocol, we train a Reinforcement Learning agent that decides the mix-up policies.
arXiv Detail & Related papers (2022-12-09T14:29:57Z) - Off-policy Reinforcement Learning with Optimistic Exploration and
Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework.
To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.