The Surprising Effectiveness of Diffusion Models for Optical Flow and
Monocular Depth Estimation
- URL: http://arxiv.org/abs/2306.01923v2
- Date: Wed, 6 Dec 2023 04:19:29 GMT
- Title: The Surprising Effectiveness of Diffusion Models for Optical Flow and
Monocular Depth Estimation
- Authors: Saurabh Saxena, Charles Herrmann, Junhwa Hur, Abhishek Kar, Mohammad
Norouzi, Deqing Sun, David J. Fleet
- Abstract summary: Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity.
We show that they also excel in estimating optical flow and monocular depth, surprisingly, without task-specific architectures and loss functions.
- Score: 42.48819460873482
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Denoising diffusion probabilistic models have transformed image generation
with their impressive fidelity and diversity. We show that they also excel in
estimating optical flow and monocular depth, surprisingly, without
task-specific architectures and loss functions that are predominant for these
tasks. Compared to the point estimates of conventional regression-based
methods, diffusion models also enable Monte Carlo inference, e.g., capturing
uncertainty and ambiguity in flow and depth. With self-supervised pre-training,
the combined use of synthetic and real data for supervised training, and
technical innovations (infilling and step-unrolled denoising diffusion
training) to handle noisy-incomplete training data, and a simple form of
coarse-to-fine refinement, one can train state-of-the-art diffusion models for
depth and optical flow estimation. Extensive experiments focus on quantitative
performance against benchmarks, ablations, and the model's ability to capture
uncertainty and multimodality, and impute missing values. Our model, DDVM
(Denoising Diffusion Vision Model), obtains a state-of-the-art relative depth
error of 0.074 on the indoor NYU benchmark and an Fl-all outlier rate of 3.26\%
on the KITTI optical flow benchmark, about 25\% better than the best published
method. For an overview see https://diffusion-vision.github.io.
Related papers
- Learning Diffusion Model from Noisy Measurement using Principled Expectation-Maximization Method [9.173055778539641]
We propose a principled expectation-maximization (EM) framework that iteratively learns diffusion models from noisy data with arbitrary corruption types.
Our framework employs a plug-and-play Monte Carlo method to accurately estimate clean images from noisy measurements, followed by training the diffusion model using the reconstructed images.
arXiv Detail & Related papers (2024-10-15T03:54:59Z) - Unsupervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion [21.939618694037108]
Unsupervised monocular depth estimation has received widespread attention because of its capability to train without ground truth.
We employ a well-converging diffusion model among generative networks for unsupervised monocular depth estimation.
This model significantly enriches the model's capacity for learning and interpreting depth distribution.
arXiv Detail & Related papers (2024-06-14T07:31:20Z) - Digging into contrastive learning for robust depth estimation with diffusion models [55.62276027922499]
We propose a novel robust depth estimation method called D4RD.
It features a custom contrastive learning mode tailored for diffusion models to mitigate performance degradation in complex environments.
In experiments, D4RD surpasses existing state-of-the-art solutions on synthetic corruption datasets and real-world weather conditions.
arXiv Detail & Related papers (2024-04-15T14:29:47Z) - PiRD: Physics-informed Residual Diffusion for Flow Field Reconstruction [5.06136344261226]
CNN-based methods for data fidelity enhancement rely on low-fidelity data patterns and distributions during the training phase.
Our proposed model - Physics-informed Residual Diffusion - demonstrates the capability to elevate the quality of data from both standard low-fidelity inputs.
Experimental results have shown that our approach can effectively reconstruct high-quality outcomes for two-dimensional turbulent flows without requiring retraining.
arXiv Detail & Related papers (2024-04-12T11:45:51Z) - DepthFM: Fast Monocular Depth Estimation with Flow Matching [22.206355073676082]
Current discriminative approaches to this problem are limited due to blurry artifacts.
State-of-the-art generative methods suffer from slow sampling due to their SDE nature.
We observe that this can be effectively framed using flow matching, since its straight trajectories through solution space offer efficiency and high quality.
arXiv Detail & Related papers (2024-03-20T17:51:53Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration.
We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z) - CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion
Models [72.93652777646233]
Camouflaged Object Detection (COD) is a challenging task in computer vision due to the high similarity between camouflaged objects and their surroundings.
We propose a new paradigm that treats COD as a conditional mask-generation task leveraging diffusion models.
Our method, dubbed CamoDiffusion, employs the denoising process of diffusion models to iteratively reduce the noise of the mask.
arXiv Detail & Related papers (2023-05-29T07:49:44Z) - How Much is Enough? A Study on Diffusion Times in Score-based Generative
Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution.
We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.