Related papers: Open-DDVM: A Reproduction and Extension of Diffusion Model for Optical Flow Estimation

Open-DDVM: A Reproduction and Extension of Diffusion Model for Optical Flow Estimation

URL: http://arxiv.org/abs/2312.01746v1
Date: Mon, 4 Dec 2023 09:10:25 GMT
Title: Open-DDVM: A Reproduction and Extension of Diffusion Model for Optical Flow Estimation
Authors: Qiaole Dong and Bo Zhao and Yanwei Fu
Abstract summary: Google proposes DDVM which for the first time demonstrates that a general diffusion model for image-to-image translation task works impressively well. However, DDVM is still a closed-source model with the expensive and private Palette-style pretraining. In this technical report, we present the first open-source DDVM by reproducing it.
Score: 56.51837025874472
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, Google proposes DDVM which for the first time demonstrates that a general diffusion model for image-to-image translation task works impressively well on optical flow estimation task without any specific designs like RAFT. However, DDVM is still a closed-source model with the expensive and private Palette-style pretraining. In this technical report, we present the first open-source DDVM by reproducing it. We study several design choices and find those important ones. By training on 40k public data with 4 GPUs, our reproduction achieves comparable performance to the closed-source DDVM. The code and model have been released in https://github.com/DQiaole/FlowDiffusion_pytorch.

Related papers

Next-Scale Autoregressive Models are Zero-Shot Single-Image Object View Synthesizers [4.015569252776372]
ArchonView is a method that significantly exceeds state-of-the-art methods despite being trained from scratch with 3D rendering data only and no 2D pretraining. Our model also exhibits robust performance even for difficult camera poses where previous methods fail, and is several times faster in inference speed compared to diffusion.
arXiv Detail & Related papers (2025-03-17T17:59:59Z)
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models [146.18107944503436]
Molmo is a new family of VLMs that are state-of-the-art in their class of openness. Our key innovation is a novel, highly detailed image caption dataset collected entirely from human annotators. We will be releasing all of our model weights, captioning and fine-tuning data, and source code in the near future.
arXiv Detail & Related papers (2024-09-25T17:59:51Z)
KerasCV and KerasNLP: Vision and Language Power-Ups [9.395199188271254]
KerasCV and KerasNLP are extensions of the Keras API for Computer Vision and Natural Language Processing. These domain packages are designed to enable fast experimentation, with a focus on ease-of-use and performance. The libraries are fully open-source (Apache 2.0 license) and available on GitHub.
arXiv Detail & Related papers (2024-05-30T16:58:34Z)
FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets. We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation. Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z)
Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model [36.57703763466984]
We propose an advanced selective hourglass mapping strategy based on diffusion model, DiffUIR. We achieve state-of-the-art performance on five image restoration tasks, 22 benchmarks in the universal setting and zero-shot generalization setting.
arXiv Detail & Related papers (2024-03-17T09:41:20Z)
DINOv2: Learning Robust Visual Features without Supervision [75.42921276202522]
This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources. Most of the technical contributions aim at accelerating and stabilizing the training at scale. In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature.
arXiv Detail & Related papers (2023-04-14T15:12:19Z)
I$^2$SB: Image-to-Image Schr\"odinger Bridge [87.43524087956457]
Image-to-Image Schr"odinger Bridge (I$2$SB) is a new class of conditional diffusion models. I$2$SB directly learns the nonlinear diffusion processes between two given distributions. We show that I$2$SB surpasses standard conditional diffusion models with more interpretable generative processes.
arXiv Detail & Related papers (2023-02-12T08:35:39Z)
One to Transfer All: A Universal Transfer Framework for Vision Foundation Model with Few Data [56.14205030170083]
We propose a universal transfer framework: One to Transfer All (OTA) to transfer any Vision Foundation Model (VFM) to any downstream tasks with few downstream data. OTA has no dependency on upstream data, VFM, and downstream tasks when transferring. Massive experiments validate the effectiveness and superiority of our methods in few data setting.
arXiv Detail & Related papers (2021-11-24T10:10:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.