Score and Distribution Matching Policy: Advanced Accelerated Visuomotor Policies via Matched Distillation
- URL: http://arxiv.org/abs/2412.09265v4
- Date: Thu, 19 Dec 2024 12:11:13 GMT
- Title: Score and Distribution Matching Policy: Advanced Accelerated Visuomotor Policies via Matched Distillation
- Authors: Bofang Jia, Pengxiang Ding, Can Cui, Mingyang Sun, Pengfang Qian, Siteng Huang, Zhaoxin Fan, Donglin Wang,
- Abstract summary: We propose the Score and Distribution Matching Policy (SDM Policy) for visual-motor policy learning.<n>SDM Policy transforms diffusion-based policies into single-step generators through a two-stage optimization process.<n>It achieves a 6x inference speedup while having state-of-the-art action quality.
- Score: 29.90613565503628
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual-motor policy learning has advanced with architectures like diffusion-based policies, known for modeling complex robotic trajectories. However, their prolonged inference times hinder high-frequency control tasks requiring real-time feedback. While consistency distillation (CD) accelerates inference, it introduces errors that compromise action quality. To address these limitations, we propose the Score and Distribution Matching Policy (SDM Policy), which transforms diffusion-based policies into single-step generators through a two-stage optimization process: score matching ensures alignment with true action distributions, and distribution matching minimizes KL divergence for consistency. A dual-teacher mechanism integrates a frozen teacher for stability and an unfrozen teacher for adversarial training, enhancing robustness and alignment with target distributions. Evaluated on a 57-task simulation benchmark, SDM Policy achieves a 6x inference speedup while having state-of-the-art action quality, providing an efficient and reliable framework for high-frequency robotic tasks.
Related papers
- SENTINEL: Stagewise Integrity Verification for Pipeline Parallel Decentralized Training [54.8494905524997]
Decentralized training introduces critical security risks when executed across untrusted, geographically distributed nodes.<n>We propose SENTINEL, a verification mechanism for pipeline parallelism (PP) training without duplication.<n>Experiments demonstrate successful training of up to 4B- parameter LLMs across untrusted distributed environments with up to 176 workers while maintaining model convergence and performance.
arXiv Detail & Related papers (2026-03-03T23:51:10Z) - Real-Time Generative Policy via Langevin-Guided Flow Matching for Autonomous Driving [22.3805998088591]
DACER-F is a flow matching algorithm for generative policies in autonomous driving systems.<n>It achieves a score of 775.8 on the humanoid-stand task and surpassing prior methods.
arXiv Detail & Related papers (2026-03-03T05:35:53Z) - Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation [65.13627721310613]
Mean velocity policy (MVP) is a new generative policy function that models the mean velocity field to achieve the fastest one-step action generation.<n>MVP achieves state-of-the-art success rates across several challenging robotic manipulation tasks from Robomimic and OGBench.
arXiv Detail & Related papers (2026-02-14T14:44:06Z) - STEP: Warm-Started Visuomotor Policies with Spatiotemporal Consistency Prediction [16.465783114087223]
iterative denoising leads to substantial inference latency, limiting control frequency in real-time closed-loop systems.<n>We propose STEP, a lightweighttemporal consistency prediction mechanism to construct high-quality warm-start actions.<n> STEP with 2 steps can achieve an average 21.6% and 27.5% higher success rate than BRIDGER and DDIM on the RoboMimic benchmark and real-world tasks.
arXiv Detail & Related papers (2026-02-09T03:50:40Z) - AdaSwitch: Adaptive Switching Generation for Knowledge Distillation [58.647880811071495]
Small language models (SLMs) are crucial for applications with strict latency and computational constraints.<n>We propose AdaSwitch, a novel approach that combines on-policy and off-policy generation at the token level.<n>AdaSwitch consistently improves accuracy, offering a practical and effective method for distilling SLMs with acceptable additional overhead.
arXiv Detail & Related papers (2025-10-09T06:38:37Z) - Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces [57.466101098183884]
Reinforcement learning (RL) struggles to scale to large, action spaces common in many real-world problems.<n>This paper introduces a novel framework for training discrete diffusion models as highly effective policies in complex settings.
arXiv Detail & Related papers (2025-09-26T21:53:36Z) - Robust Policy Switching for Antifragile Reinforcement Learning for UAV Deconfliction in Adversarial Environments [6.956559003734227]
An unmanned aerial vehicles (UAVs) has been exposed to adversarial attacks that exploit vulnerabilities in reinforcement learning (RL)<n>This paper introduces an antifragile RL framework that enhances adaptability to broader distributional shifts.<n>It achieves superior performance, demonstrating shorter navigation path lengths and a higher rate of conflict-free navigation trajectories.
arXiv Detail & Related papers (2025-06-26T10:06:29Z) - Efficient Beam Selection for ISAC in Cell-Free Massive MIMO via Digital Twin-Assisted Deep Reinforcement Learning [37.540612510652174]
We derive the distribution of joint target detection probabilities across multiple receiving APs under false alarm rate constraints.<n>We then formulate the beam selection procedure as a Markov decision process (MDP)<n>To eliminate the high costs and associated risks of real-time agent-environment interactions, we propose a novel digital twin (DT)-assisted offline DRL approach.
arXiv Detail & Related papers (2025-06-23T12:17:57Z) - Fast Adaptation with Behavioral Foundation Models [82.34700481726951]
Unsupervised zero-shot reinforcement learning has emerged as a powerful paradigm for pretraining behavioral foundation models.
Despite promising results, zero-shot policies are often suboptimal due to errors induced by the unsupervised training process.
We propose fast adaptation strategies that search in the low-dimensional task-embedding space of the pre-trained BFM to rapidly improve the performance of its zero-shot policies.
arXiv Detail & Related papers (2025-04-10T16:14:17Z) - Real-Time Diffusion Policies for Games: Enhancing Consistency Policies with Q-Ensembles [43.11094512130205]
We present CPQE (Consistency Policy with Q-ensembles), which combines consistency models with Q-ensembles to address policy learning challenges.
CPQE achieves inference speeds of up to 60 Hz, a significant improvement over state-of-the-art diffusion policies that operate at only 20 Hz.
These results indicate that CPQE offers a practical solution for deploying diffusion-based policies in games and other real-time applications.
arXiv Detail & Related papers (2025-03-21T09:45:59Z) - Fast and Robust Visuomotor Riemannian Flow Matching Policy [15.341017260123927]
Diffusion-based visuomotor policies excel at learning complex robotic tasks.
RFMP is a model that inherits the easy training and fast inference capabilities of flow matching.
arXiv Detail & Related papers (2024-12-14T15:03:33Z) - CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction [28.761494362934087]
Coarse-to-Fine AutoRegressive Policy (CARP) is a novel paradigm for visuomotor policy learning.
It redefines the autoregressive action generation process as a coarse-to-fine, next-scale approach.
CARP achieves competitive success rates, with up to a 10% improvement, and delivers 10x faster inference compared to state-of-the-art policies.
arXiv Detail & Related papers (2024-12-09T18:59:18Z) - One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation [80.71541671907426]
OneStep Diffusion Policy (OneDP) is a novel approach that distills knowledge from pre-trained diffusion policies into a single-step action generator.
OneDP significantly accelerates response times for robotic control tasks.
arXiv Detail & Related papers (2024-10-28T17:54:31Z) - Diffusion Policies creating a Trust Region for Offline Reinforcement Learning [66.17291150498276]
We introduce a dual policy approach, Diffusion Trusted Q-Learning (DTQL), which comprises a diffusion policy for pure behavior cloning and a practical one-step policy.
DTQL eliminates the need for iterative denoising sampling during both training and inference, making it remarkably computationally efficient.
We show that DTQL could not only outperform other methods on the majority of the D4RL benchmark tasks but also demonstrate efficiency in training and inference speeds.
arXiv Detail & Related papers (2024-05-30T05:04:33Z) - Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation [31.534668378308822]
Consistency Policy is a faster and similarly powerful alternative to Diffusion Policy for learning visuomotor robot control.
By virtue of its fast inference speed, Consistency Policy can enable low latency decision making in resource-constrained robotic setups.
Key design decisions that enabled this performance are the choice of consistency objective, reduced initial sample variance, and the choice of preset chaining steps.
arXiv Detail & Related papers (2024-05-13T06:53:42Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Expeditious Saliency-guided Mix-up through Random Gradient Thresholding [89.59134648542042]
Mix-up training approaches have proven to be effective in improving the generalization ability of Deep Neural Networks.
In this paper, inspired by the superior qualities of each direction over one another, we introduce a novel method that lies at the junction of the two routes.
We name our method R-Mix following the concept of "Random Mix-up"
In order to address the question of whether there exists a better decision protocol, we train a Reinforcement Learning agent that decides the mix-up policies.
arXiv Detail & Related papers (2022-12-09T14:29:57Z) - Off-policy Reinforcement Learning with Optimistic Exploration and
Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework.
To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.