R2-Diff: Denoising by diffusion as a refinement of retrieved motion for
image-based motion prediction
- URL: http://arxiv.org/abs/2306.09483v1
- Date: Thu, 15 Jun 2023 20:27:06 GMT
- Title: R2-Diff: Denoising by diffusion as a refinement of retrieved motion for
image-based motion prediction
- Authors: Takeru Oba and Norimichi Ukita
- Abstract summary: In image-based motion prediction, diffusion models predict contextually appropriate motion by gradually denoising random noise based on the image context.
In R2-Diff, a motion retrieved from a dataset based on image similarity is fed into a diffusion model instead of random noise.
R2-Diff accurately predicts appropriate motions and achieves high task success rates compared to recent state-of-the-art models in robot manipulation.
- Score: 8.104557130048407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image-based motion prediction is one of the essential techniques for robot
manipulation. Among the various prediction models, we focus on diffusion models
because they have achieved state-of-the-art performance in various
applications. In image-based motion prediction, diffusion models stochastically
predict contextually appropriate motion by gradually denoising random Gaussian
noise based on the image context. While diffusion models are able to predict
various motions by changing the random noise, they sometimes fail to predict a
contextually appropriate motion based on the image because the random noise is
sampled independently of the image context. To solve this problem, we propose
R2-Diff. In R2-Diff, a motion retrieved from a dataset based on image
similarity is fed into a diffusion model instead of random noise. Then, the
retrieved motion is refined through the denoising process of the diffusion
model. Since the retrieved motion is almost appropriate to the context, it
becomes easier to predict contextually appropriate motion. However, traditional
diffusion models are not optimized to refine the retrieved motion. Therefore,
we propose the method of tuning the hyperparameters based on the distance of
the nearest neighbor motion among the dataset to optimize the diffusion model
for refinement. Furthermore, we propose an image-based retrieval method to
retrieve the nearest neighbor motion in inference. Our proposed retrieval
efficiently computes the similarity based on the image features along the
motion trajectory. We demonstrate that R2-Diff accurately predicts appropriate
motions and achieves high task success rates compared to recent
state-of-the-art models in robot manipulation.
Related papers
- Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
Diffusion models have dominated the field of large, generative image models.
We propose an algorithm for fast-constrained sampling in large pre-trained diffusion models.
arXiv Detail & Related papers (2024-10-24T14:52:38Z) - Sequential Posterior Sampling with Diffusion Models [15.028061496012924]
We propose a novel approach that models the transition dynamics to improve the efficiency of sequential diffusion posterior sampling in conditional image synthesis.
We demonstrate the effectiveness of our approach on a real-world dataset of high frame rate cardiac ultrasound images.
Our method opens up new possibilities for real-time applications of diffusion models in imaging and other domains requiring real-time inference.
arXiv Detail & Related papers (2024-09-09T07:55:59Z) - Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment [56.609042046176555]
suboptimal noise-data mapping leads to slow training of diffusion models.
Drawing inspiration from the immiscibility phenomenon in physics, we propose Immiscible Diffusion.
Our approach is remarkably simple, requiring only one line of code to restrict the diffuse-able area for each image.
arXiv Detail & Related papers (2024-06-18T06:20:42Z) - ReNoise: Real Image Inversion Through Iterative Noising [62.96073631599749]
We introduce an inversion method with a high quality-to-operation ratio, enhancing reconstruction accuracy without increasing the number of operations.
We evaluate the performance of our ReNoise technique using various sampling algorithms and models, including recent accelerated diffusion models.
arXiv Detail & Related papers (2024-03-21T17:52:08Z) - RoHM: Robust Human Motion Reconstruction via Diffusion [58.63706638272891]
RoHM is an approach for robust 3D human motion reconstruction from monocular RGB(-D) videos.
It conditioned on noisy and occluded input data, reconstructs complete, plausible motions in consistent global coordinates.
Our method outperforms state-of-the-art approaches qualitatively and quantitatively, while being faster at test time.
arXiv Detail & Related papers (2024-01-16T18:57:50Z) - ExposureDiffusion: Learning to Expose for Low-light Image Enhancement [87.08496758469835]
This work addresses the issue by seamlessly integrating a diffusion model with a physics-based exposure model.
Our method obtains significantly improved performance and reduced inference time compared with vanilla diffusion models.
The proposed framework can work with both real-paired datasets, SOTA noise models, and different backbone networks.
arXiv Detail & Related papers (2023-07-15T04:48:35Z) - Real-World Denoising via Diffusion Model [14.722529440511446]
Real-world image denoising aims to recover clean images from noisy images captured in natural environments.
diffusion models have achieved very promising results in the field of image generation, outperforming previous generation models.
This paper proposes a novel general denoising diffusion model that can be used for real-world image denoising.
arXiv Detail & Related papers (2023-05-08T04:48:03Z) - Fast Sampling of Diffusion Models via Operator Learning [74.37531458470086]
We use neural operators, an efficient method to solve the probability flow differential equations, to accelerate the sampling process of diffusion models.
Compared to other fast sampling methods that have a sequential nature, we are the first to propose a parallel decoding method.
We show our method achieves state-of-the-art FID of 3.78 for CIFAR-10 and 7.83 for ImageNet-64 in the one-model-evaluation setting.
arXiv Detail & Related papers (2022-11-24T07:30:27Z) - Human Joint Kinematics Diffusion-Refinement for Stochastic Motion
Prediction [22.354538952573158]
MotionDiff is a diffusion probabilistic model to treat the kinematics of human joints as heated particles.
MotionDiff consists of two parts: a spatial-temporal transformer-based diffusion network to generate diverse yet plausible motions, and a graph convolutional network to further refine the outputs.
arXiv Detail & Related papers (2022-10-12T07:38:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.