Score and Distribution Matching Policy: Advanced Accelerated Visuomotor Policies via Matched Distillation
- URL: http://arxiv.org/abs/2412.09265v4
- Date: Thu, 19 Dec 2024 12:11:13 GMT
- Title: Score and Distribution Matching Policy: Advanced Accelerated Visuomotor Policies via Matched Distillation
- Authors: Bofang Jia, Pengxiang Ding, Can Cui, Mingyang Sun, Pengfang Qian, Siteng Huang, Zhaoxin Fan, Donglin Wang,
- Abstract summary: We propose the Score and Distribution Matching Policy (SDM Policy) for visual-motor policy learning.
SDM Policy transforms diffusion-based policies into single-step generators through a two-stage optimization process.
It achieves a 6x inference speedup while having state-of-the-art action quality.
- Score: 29.90613565503628
- License:
- Abstract: Visual-motor policy learning has advanced with architectures like diffusion-based policies, known for modeling complex robotic trajectories. However, their prolonged inference times hinder high-frequency control tasks requiring real-time feedback. While consistency distillation (CD) accelerates inference, it introduces errors that compromise action quality. To address these limitations, we propose the Score and Distribution Matching Policy (SDM Policy), which transforms diffusion-based policies into single-step generators through a two-stage optimization process: score matching ensures alignment with true action distributions, and distribution matching minimizes KL divergence for consistency. A dual-teacher mechanism integrates a frozen teacher for stability and an unfrozen teacher for adversarial training, enhancing robustness and alignment with target distributions. Evaluated on a 57-task simulation benchmark, SDM Policy achieves a 6x inference speedup while having state-of-the-art action quality, providing an efficient and reliable framework for high-frequency robotic tasks.
Related papers
- Fast and Robust Visuomotor Riemannian Flow Matching Policy [15.341017260123927]
Diffusion-based visuomotor policies excel at learning complex robotic tasks.
RFMP is a model that inherits the easy training and fast inference capabilities of flow matching.
arXiv Detail & Related papers (2024-12-14T15:03:33Z) - CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction [28.761494362934087]
Coarse-to-Fine AutoRegressive Policy (CARP) is a novel paradigm for visuomotor policy learning.
It redefines the autoregressive action generation process as a coarse-to-fine, next-scale approach.
CARP achieves competitive success rates, with up to a 10% improvement, and delivers 10x faster inference compared to state-of-the-art policies.
arXiv Detail & Related papers (2024-12-09T18:59:18Z) - One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation [80.71541671907426]
OneStep Diffusion Policy (OneDP) is a novel approach that distills knowledge from pre-trained diffusion policies into a single-step action generator.
OneDP significantly accelerates response times for robotic control tasks.
arXiv Detail & Related papers (2024-10-28T17:54:31Z) - Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation [31.534668378308822]
Consistency Policy is a faster and similarly powerful alternative to Diffusion Policy for learning visuomotor robot control.
By virtue of its fast inference speed, Consistency Policy can enable low latency decision making in resource-constrained robotic setups.
Key design decisions that enabled this performance are the choice of consistency objective, reduced initial sample variance, and the choice of preset chaining steps.
arXiv Detail & Related papers (2024-05-13T06:53:42Z) - Integrating DeepRL with Robust Low-Level Control in Robotic Manipulators for Non-Repetitive Reaching Tasks [0.24578723416255746]
In robotics, contemporary strategies are learning-based, characterized by a complex black-box nature and a lack of interpretability.
We propose integrating a collision-free trajectory planner based on deep reinforcement learning (DRL) with a novel auto-tuning low-level control strategy.
arXiv Detail & Related papers (2024-02-04T15:54:03Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Expeditious Saliency-guided Mix-up through Random Gradient Thresholding [89.59134648542042]
Mix-up training approaches have proven to be effective in improving the generalization ability of Deep Neural Networks.
In this paper, inspired by the superior qualities of each direction over one another, we introduce a novel method that lies at the junction of the two routes.
We name our method R-Mix following the concept of "Random Mix-up"
In order to address the question of whether there exists a better decision protocol, we train a Reinforcement Learning agent that decides the mix-up policies.
arXiv Detail & Related papers (2022-12-09T14:29:57Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Off-policy Reinforcement Learning with Optimistic Exploration and
Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework.
To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.