Open-DDVM: A Reproduction and Extension of Diffusion Model for Optical
Flow Estimation
- URL: http://arxiv.org/abs/2312.01746v1
- Date: Mon, 4 Dec 2023 09:10:25 GMT
- Title: Open-DDVM: A Reproduction and Extension of Diffusion Model for Optical
Flow Estimation
- Authors: Qiaole Dong and Bo Zhao and Yanwei Fu
- Abstract summary: Google proposes DDVM which for the first time demonstrates that a general diffusion model for image-to-image translation task works impressively well.
However, DDVM is still a closed-source model with the expensive and private Palette-style pretraining.
In this technical report, we present the first open-source DDVM by reproducing it.
- Score: 56.51837025874472
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, Google proposes DDVM which for the first time demonstrates that a
general diffusion model for image-to-image translation task works impressively
well on optical flow estimation task without any specific designs like RAFT.
However, DDVM is still a closed-source model with the expensive and private
Palette-style pretraining. In this technical report, we present the first
open-source DDVM by reproducing it. We study several design choices and find
those important ones. By training on 40k public data with 4 GPUs, our
reproduction achieves comparable performance to the closed-source DDVM. The
code and model have been released in
https://github.com/DQiaole/FlowDiffusion_pytorch.
Related papers
- Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models [146.18107944503436]
Molmo is a new family of VLMs that are state-of-the-art in their class of openness.
Our key innovation is a novel, highly detailed image caption dataset collected entirely from human annotators.
We will be releasing all of our model weights, captioning and fine-tuning data, and source code in the near future.
arXiv Detail & Related papers (2024-09-25T17:59:51Z) - KerasCV and KerasNLP: Vision and Language Power-Ups [9.395199188271254]
KerasCV and KerasNLP are extensions of the Keras API for Computer Vision and Natural Language Processing.
These domain packages are designed to enable fast experimentation, with a focus on ease-of-use and performance.
The libraries are fully open-source (Apache 2.0 license) and available on GitHub.
arXiv Detail & Related papers (2024-05-30T16:58:34Z) - FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets.
We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation.
Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z) - Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model [36.57703763466984]
We propose an advanced selective hourglass mapping strategy based on diffusion model, DiffUIR.
We achieve state-of-the-art performance on five image restoration tasks, 22 benchmarks in the universal setting and zero-shot generalization setting.
arXiv Detail & Related papers (2024-03-17T09:41:20Z) - DINOv2: Learning Robust Visual Features without Supervision [75.42921276202522]
This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources.
Most of the technical contributions aim at accelerating and stabilizing the training at scale.
In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature.
arXiv Detail & Related papers (2023-04-14T15:12:19Z) - I$^2$SB: Image-to-Image Schr\"odinger Bridge [87.43524087956457]
Image-to-Image Schr"odinger Bridge (I$2$SB) is a new class of conditional diffusion models.
I$2$SB directly learns the nonlinear diffusion processes between two given distributions.
We show that I$2$SB surpasses standard conditional diffusion models with more interpretable generative processes.
arXiv Detail & Related papers (2023-02-12T08:35:39Z) - One to Transfer All: A Universal Transfer Framework for Vision
Foundation Model with Few Data [56.14205030170083]
We propose a universal transfer framework: One to Transfer All (OTA) to transfer any Vision Foundation Model (VFM) to any downstream tasks with few downstream data.
OTA has no dependency on upstream data, VFM, and downstream tasks when transferring.
Massive experiments validate the effectiveness and superiority of our methods in few data setting.
arXiv Detail & Related papers (2021-11-24T10:10:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.