Related papers: Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation

Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation

URL: http://arxiv.org/abs/2405.07503v2
Date: Fri, 28 Jun 2024 21:56:25 GMT
Title: Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation
Authors: Aaditya Prasad, Kevin Lin, Jimmy Wu, Linqi Zhou, Jeannette Bohg,
Abstract summary: Consistency Policy is a faster and similarly powerful alternative to Diffusion Policy for learning visuomotor robot control. By virtue of its fast inference speed, Consistency Policy can enable low latency decision making in resource-constrained robotic setups. Key design decisions that enabled this performance are the choice of consistency objective, reduced initial sample variance, and the choice of preset chaining steps.
Score: 31.534668378308822
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many robotic systems, such as mobile manipulators or quadrotors, cannot be equipped with high-end GPUs due to space, weight, and power constraints. These constraints prevent these systems from leveraging recent developments in visuomotor policy architectures that require high-end GPUs to achieve fast policy inference. In this paper, we propose Consistency Policy, a faster and similarly powerful alternative to Diffusion Policy for learning visuomotor robot control. By virtue of its fast inference speed, Consistency Policy can enable low latency decision making in resource-constrained robotic setups. A Consistency Policy is distilled from a pretrained Diffusion Policy by enforcing self-consistency along the Diffusion Policy's learned trajectories. We compare Consistency Policy with Diffusion Policy and other related speed-up methods across 6 simulation tasks as well as three real-world tasks where we demonstrate inference on a laptop GPU. For all these tasks, Consistency Policy speeds up inference by an order of magnitude compared to the fastest alternative method and maintains competitive success rates. We also show that the Conistency Policy training procedure is robust to the pretrained Diffusion Policy's quality, a useful result that helps practioners avoid extensive testing of the pretrained model. Key design decisions that enabled this performance are the choice of consistency objective, reduced initial sample variance, and the choice of preset chaining steps.

Related papers

Relative Entropy Pathwise Policy Optimization [56.86405621176669]
We show how to construct a value-gradient driven, on-policy algorithm that allow training Q-value models purely from on-policy data.<n>We propose Relative Entropy Pathwise Policy Optimization (REPPO), an efficient on-policy algorithm that combines the sample-efficiency of pathwise policy gradients with the simplicity and minimal memory footprint of standard on-policy learning.
arXiv Detail & Related papers (2025-07-15T06:24:07Z)
Conditioning Matters: Training Diffusion Policies is Faster Than You Think [69.31534053485711]
Diffusion policies have emerged as a mainstream paradigm for building vision-language-action (VLA) models.<n>We identify a fundamental challenge in conditional diffusion policy training: when generative conditions are hard to distinguish, the training objective degenerates into modeling the marginal action distribution.<n>We propose Cocos, a solution that modifies the source distribution in the conditional flow matching to be condition-dependent.
arXiv Detail & Related papers (2025-05-16T11:14:22Z)
Score and Distribution Matching Policy: Advanced Accelerated Visuomotor Policies via Matched Distillation [29.90613565503628]
We propose the Score and Distribution Matching Policy (SDM Policy) for visual-motor policy learning. SDM Policy transforms diffusion-based policies into single-step generators through a two-stage optimization process. It achieves a 6x inference speedup while having state-of-the-art action quality.
arXiv Detail & Related papers (2024-12-12T13:22:02Z)
Learning Diffusion Policies from Demonstrations For Compliant Contact-rich Manipulation [5.1245307851495]
This paper introduces Diffusion Policies For Compliant Manipulation (DIPCOM), a novel diffusion-based framework for compliant control tasks. By leveraging generative diffusion models, we develop a policy that predicts Cartesian end-effector poses and adjusts arm stiffness to maintain the necessary force. Our approach enhances force control through multimodal distribution modeling, improves the integration of diffusion policies in compliance control, and extends our previous work by demonstrating its effectiveness in real-world tasks.
arXiv Detail & Related papers (2024-10-25T00:56:15Z)
Diffusion Policy Policy Optimization [37.04382170999901]
Diffusion Policy Optimization, DPPO, is an algorithmic framework for fine-tuning diffusion-based policies. DPO achieves the strongest overall performance and efficiency for fine-tuning in common benchmarks. We show that DPPO takes advantage of unique synergies between RL fine-tuning and the diffusion parameterization.
arXiv Detail & Related papers (2024-09-01T02:47:50Z)
Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. In common practice, convergence (hyper)policies are learned only to deploy their deterministic version. We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z)
Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline Reinforcement Learning [57.83919813698673]
Projected Off-Policy Q-Learning (POP-QL) is a novel actor-critic algorithm that simultaneously reweights off-policy samples and constrains the policy to prevent divergence and reduce value-approximation error. In our experiments, POP-QL not only shows competitive performance on standard benchmarks, but also out-performs competing methods in tasks where the data-collection policy is significantly sub-optimal.
arXiv Detail & Related papers (2023-11-25T00:30:58Z)
Time-Efficient Reinforcement Learning with Stochastic Stateful Policies [20.545058017790428]
We present a novel approach for training stateful policies by decomposing the latter into a gradient internal state kernel and a stateless policy. We introduce different versions of the stateful policy gradient theorem, enabling us to easily instantiate stateful variants of popular reinforcement learning algorithms.
arXiv Detail & Related papers (2023-11-07T15:48:07Z)
Conformal Policy Learning for Sensorimotor Control Under Distribution Shifts [61.929388479847525]
This paper focuses on the problem of detecting and reacting to changes in the distribution of a sensorimotor controller's observables. The key idea is the design of switching policies that can take conformal quantiles as input. We show how to design such policies by using conformal quantiles to switch between base policies with different characteristics.
arXiv Detail & Related papers (2023-11-02T17:59:30Z)
Boosting Continuous Control with Consistency Policy [14.78980095597872]
We propose a novel time-efficiency method named Consistency Policy with Q-Learning (CPQL) By establishing a mapping from the reverse diffusion trajectories to the desired policy, we simultaneously address the issues of time efficiency and inaccurate guidance. CPQL achieves new state-of-the-art performance on 11 offline and 21 online tasks, significantly improving inference speed by nearly 45 times compared to Diffusion-QL.
arXiv Detail & Related papers (2023-10-10T06:26:05Z)
Policy Dispersion in Non-Markovian Environment [53.05904889617441]
This paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment. We first adopt a transformer-based method to learn policy embeddings. Then, we stack the policy embeddings to construct a dispersion matrix to induce a set of diverse policies.
arXiv Detail & Related papers (2023-02-28T11:58:39Z)
Enforcing robust control guarantees within neural network policies [76.00287474159973]
We propose a generic nonlinear control policy class, parameterized by neural networks, that enforces the same provable robustness criteria as robust control. We demonstrate the power of this approach on several domains, improving in average-case performance over existing robust control methods and in worst-case stability over (non-robust) deep RL methods.
arXiv Detail & Related papers (2020-11-16T17:14:59Z)
DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning [95.60782037764928]
We show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled. Second, we pinpoint training instabilities, typical of off-policy algorithms, to the greedy policy update step. Third, we show that ideas in the propensity estimation literature can be used to importance-sample transitions from replay buffer and update policy to prevent deterioration of performance.
arXiv Detail & Related papers (2020-06-26T20:21:12Z)
Learning Constrained Adaptive Differentiable Predictive Control Policies With Guarantees [1.1086440815804224]
We present differentiable predictive control (DPC), a method for learning constrained neural control policies for linear systems. We employ automatic differentiation to obtain direct policy gradients by backpropagating the model predictive control (MPC) loss function and constraints penalties through a differentiable closed-loop system dynamics model.
arXiv Detail & Related papers (2020-04-23T14:24:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.