Flow-Guided Diffusion for Video Inpainting
- URL: http://arxiv.org/abs/2311.15368v1
- Date: Sun, 26 Nov 2023 17:48:48 GMT
- Title: Flow-Guided Diffusion for Video Inpainting
- Authors: Bohai Gu, Yongsheng Yu, Heng Fan, Libo Zhang
- Abstract summary: Video inpainting has been challenged by complex scenarios like large movements and low-light conditions.
Current methods, including emerging diffusion models, face limitations in quality and efficiency.
This paper introduces the Flow-Guided Diffusion model for Video Inpainting (FGDVI), a novel approach that significantly enhances temporal consistency and inpainting quality.
- Score: 15.478104117672803
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video inpainting has been challenged by complex scenarios like large
movements and low-light conditions. Current methods, including emerging
diffusion models, face limitations in quality and efficiency. This paper
introduces the Flow-Guided Diffusion model for Video Inpainting (FGDVI), a
novel approach that significantly enhances temporal consistency and inpainting
quality via reusing an off-the-shelf image generation diffusion model. We
employ optical flow for precise one-step latent propagation and introduces a
model-agnostic flow-guided latent interpolation technique. This technique
expedites denoising, seamlessly integrating with any Video Diffusion Model
(VDM) without additional training. Our FGDVI demonstrates a remarkable 10%
improvement in flow warping error E_warp over existing state-of-the-art
methods. Our comprehensive experiments validate superior performance of FGDVI,
offering a promising direction for advanced video inpainting. The code and
detailed results will be publicly available in
https://github.com/NevSNev/FGDVI.
Related papers
- One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation [60.54811860967658]
FluxSR is a novel one-step diffusion Real-ISR based on flow matching models.
First, we introduce Flow Trajectory Distillation (FTD) to distill a multi-step flow matching model into a one-step Real-ISR.
Second, to improve image realism and address high-frequency artifact issues in generated images, we propose TV-LPIPS as a perceptual loss.
arXiv Detail & Related papers (2025-02-04T04:11:29Z) - VipDiff: Towards Coherent and Diverse Video Inpainting via Training-free Denoising Diffusion Models [21.584843961386888]
VipDiff is a framework for conditioning diffusion model on the reverse diffusion process to produce temporal-coherent inpainting results.
It can largely outperform state-of-the-art video inpainting methods in terms of both spatial-temporal coherence and fidelity.
arXiv Detail & Related papers (2025-01-21T16:39:09Z) - DiffuEraser: A Diffusion Model for Video Inpainting [13.292164408616257]
We introduce DiffuEraser, a video inpainting model based on stable diffusion, to fill masked regions with greater details and more coherent structures.
We also expand the temporal receptive fields of both the prior model and DiffuEraser, and further enhance consistency by leveraging the temporal smoothing property of Video Diffusion Models.
arXiv Detail & Related papers (2025-01-17T08:03:02Z) - Advanced Video Inpainting Using Optical Flow-Guided Efficient Diffusion [13.649604333753727]
This paper proposes an advanced video inpainting framework using optical Flow-guided Efficient Diffusion, called FloED.
FloED employs a dual-branch architecture, where a flow branch first restores corrupted flow and a multi-scale flow adapter provides motion guidance to the main inpainting branch.
Experiments in both background restoration and object removal tasks demonstrate that FloED outperforms state-of-the-art methods from the perspective of both performance and efficiency.
arXiv Detail & Related papers (2024-12-01T15:45:26Z) - VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide [48.22321420680046]
VideoGuide is a novel framework that enhances the temporal consistency of pretrained text-to-video (T2V) models.
It improves temporal quality by interpolating the guiding model's denoised samples into the sampling model's denoising process.
The proposed method brings about significant improvement in temporal consistency and image fidelity.
arXiv Detail & Related papers (2024-10-06T05:46:17Z) - Video Diffusion Models are Strong Video Inpainter [14.402778136825642]
We propose a novel First Frame Filling Video Diffusion Inpainting model (FFF-VDI)
We propagate the noise latent information of future frames to fill the masked areas of the first frame's noise latent code.
Next, we fine-tune the pre-trained image-to-video diffusion model to generate the inpainted video.
arXiv Detail & Related papers (2024-08-21T08:01:00Z) - Guided Flows for Generative Modeling and Decision Making [55.42634941614435]
We show that Guided Flows significantly improves the sample quality in conditional image generation and zero-shot text synthesis-to-speech.
Notably, we are first to apply flow models for plan generation in the offline reinforcement learning setting ax speedup in compared to diffusion models.
arXiv Detail & Related papers (2023-11-22T15:07:59Z) - Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration.
We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z) - Diffusion Models as Masked Autoencoders [52.442717717898056]
We revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models.
While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE)
We perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.
arXiv Detail & Related papers (2023-04-06T17:59:56Z) - VIDM: Video Implicit Diffusion Models [75.90225524502759]
Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images.
We propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition.
We improve the quality of the generated videos by proposing multiple strategies such as sampling space truncation, robustness penalty, and positional group normalization.
arXiv Detail & Related papers (2022-12-01T02:58:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.