Related papers: ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation

ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation

URL: http://arxiv.org/abs/2406.01586v1
Date: Mon, 3 Jun 2024 17:59:23 GMT
Title: ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation
Authors: Guanxing Lu, Zifeng Gao, Tianxing Chen, Wenxun Dai, Ziwei Wang, Yansong Tang,
Abstract summary: Diffusion models have been verified to be effective in generating complex distributions from natural images to motion trajectories. Recent methods show impressive performance in 3D robotic manipulation tasks, whereas they suffer from severe runtime inefficiency due to multiple denoising steps. We propose a real-time robotic manipulation model named ManiCM that imposes the consistency constraint to the diffusion process.
Score: 16.272352213590313
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models have been verified to be effective in generating complex distributions from natural images to motion trajectories. Recent diffusion-based methods show impressive performance in 3D robotic manipulation tasks, whereas they suffer from severe runtime inefficiency due to multiple denoising steps, especially with high-dimensional observations. To this end, we propose a real-time robotic manipulation model named ManiCM that imposes the consistency constraint to the diffusion process, so that the model can generate robot actions in only one-step inference. Specifically, we formulate a consistent diffusion process in the robot action space conditioned on the point cloud input, where the original action is required to be directly denoised from any point along the ODE trajectory. To model this process, we design a consistency distillation technique to predict the action sample directly instead of predicting the noise within the vision community for fast convergence in the low-dimensional action manifold. We evaluate ManiCM on 31 robotic manipulation tasks from Adroit and Metaworld, and the results demonstrate that our approach accelerates the state-of-the-art method by 10 times in average inference speed while maintaining competitive average success rate.

Related papers

FRMD: Fast Robot Motion Diffusion with Consistency-Distilled Movement Primitives for Smooth Action Generation [3.7351623987275873]
We propose Fast Robot Motion Diffusion to generate smooth, temporally consistent robot motions. Our method integrates Movement Primitives (MPs) with Consistency Models to enable efficient, single-step trajectory generation. Our results show that FRMD generates significantly faster, smoother trajectories while achieving higher success rates.
arXiv Detail & Related papers (2025-03-03T20:56:39Z)
FAST: Efficient Action Tokenization for Vision-Language-Action Models [98.15494168962563]
We propose a new compression-based tokenization scheme for robot actions, based on the discrete cosine transform. Based on FAST, we release FAST+, a universal robot action tokenizer, trained on 1M real robot action trajectories.
arXiv Detail & Related papers (2025-01-16T18:57:04Z)
RobotDiffuse: Motion Planning for Redundant Manipulator based on Diffusion Model [13.110235244912474]
Redundant manipulators offer enhanced kinematic performance and versatility. Motion planning for these manipulators is challenging due to increased DOFs and complex, dynamic environments. This paper introduces RobotDiffuse, a diffusion model-based approach for motion planning in redundant manipulators.
arXiv Detail & Related papers (2024-12-27T07:34:54Z)
Bayesian-Optimized One-Step Diffusion Model with Knowledge Distillation for Real-Time 3D Human Motion Prediction [2.402745776249116]
We propose training a one-step multi-layer perceptron-based (MLP-based) diffusion model for motion prediction using knowledge distillation and Bayesian optimization. Our model can significantly improve the inference speed, achieving real-time prediction without noticeable degradation in performance.
arXiv Detail & Related papers (2024-09-19T04:36:40Z)
COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation [98.05046790227561]
COIN is a control-inpainting motion diffusion prior that enables fine-grained control to disentangle human and camera motions. COIN outperforms the state-of-the-art methods in terms of global human motion estimation and camera motion estimation.
arXiv Detail & Related papers (2024-08-29T10:36:29Z)
ReNoise: Real Image Inversion Through Iterative Noising [62.96073631599749]
We introduce an inversion method with a high quality-to-operation ratio, enhancing reconstruction accuracy without increasing the number of operations. We evaluate the performance of our ReNoise technique using various sampling algorithms and models, including recent accelerated diffusion models.
arXiv Detail & Related papers (2024-03-21T17:52:08Z)
GazeMoDiff: Gaze-guided Diffusion Model for Stochastic Human Motion Prediction [10.982807572404166]
We present GazeMo - a novel gaze-guided denoising diffusion model to generate human motions. Our method first uses a gaze encoder to extract the gaze and motion features respectively, then employs a graph attention network to fuse these features. Our method outperforms the state-of-the-art methods by a large margin in terms of multi-modal final error.
arXiv Detail & Related papers (2023-12-19T12:10:12Z)
Movement Primitive Diffusion: Learning Gentle Robotic Manipulation of Deformable Objects [14.446751610174868]
Movement Primitive Diffusion (MPD) is a novel method for imitation learning (IL) in robot-assisted surgery. MPD combines the versatility of diffusion-based imitation learning (DIL) with the high-quality motion generation capabilities of Probabilistic Dynamic Movement Primitives (ProDMPs) We evaluate MPD across various simulated and real world robotic tasks on both state and image observations.
arXiv Detail & Related papers (2023-12-15T18:24:28Z)
Motion Flow Matching for Human Motion Synthesis and Editing [75.13665467944314]
We propose emphMotion Flow Matching, a novel generative model for human motion generation featuring efficient sampling and effectiveness in motion editing applications. Our method reduces the sampling complexity from thousand steps in previous diffusion models to just ten steps, while achieving comparable performance in text-to-motion and action-to-motion generation benchmarks.
arXiv Detail & Related papers (2023-12-14T12:57:35Z)
Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation [5.11432473998551]
Diffusion-EDFs is a novel SE(3)-equivariant diffusion-based approach for visual robotic manipulation tasks. We show that our proposed method achieves remarkable data efficiency, requiring only 5 to 10 human demonstrations for effective end-to-end training in less than an hour.
arXiv Detail & Related papers (2023-09-06T03:42:20Z)
CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion Models [72.93652777646233]
Camouflaged Object Detection (COD) is a challenging task in computer vision due to the high similarity between camouflaged objects and their surroundings. We propose a new paradigm that treats COD as a conditional mask-generation task leveraging diffusion models. Our method, dubbed CamoDiffusion, employs the denoising process of diffusion models to iteratively reduce the noise of the mask.
arXiv Detail & Related papers (2023-05-29T07:49:44Z)
Modiff: Action-Conditioned 3D Motion Generation with Denoising Diffusion Probabilistic Models [58.357180353368896]
We propose a conditional paradigm that benefits from the denoising diffusion probabilistic model (DDPM) to tackle the problem of realistic and diverse action-conditioned 3D skeleton-based motion generation. We are a pioneering attempt that uses DDPM to synthesize a variable number of motion sequences conditioned on a categorical action.
arXiv Detail & Related papers (2023-01-10T13:15:42Z)
Fast Sampling of Diffusion Models via Operator Learning [74.37531458470086]
We use neural operators, an efficient method to solve the probability flow differential equations, to accelerate the sampling process of diffusion models. Compared to other fast sampling methods that have a sequential nature, we are the first to propose a parallel decoding method. We show our method achieves state-of-the-art FID of 3.78 for CIFAR-10 and 7.83 for ImageNet-64 in the one-model-evaluation setting.
arXiv Detail & Related papers (2022-11-24T07:30:27Z)
STEADY: Simultaneous State Estimation and Dynamics Learning from Indirect Observations [17.86873192361793]
Many state-of-the-art approaches in learning kinodynamic models require precise measurements of robot states as labeled input/output examples. We propose a new technique for learning neural kinodynamic models by performing simultaneous state estimation and dynamics learning. Our approach achieves significantly higher accuracy but is also more robust to observation noise, thereby showing promise for boosting the performance of many other robotics applications.
arXiv Detail & Related papers (2022-03-02T18:25:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.