Infusion: Internal Diffusion for Video Inpainting
- URL: http://arxiv.org/abs/2311.01090v1
- Date: Thu, 2 Nov 2023 08:55:11 GMT
- Title: Infusion: Internal Diffusion for Video Inpainting
- Authors: Nicolas Cherel, Andr\'es Almansa, Yann Gousseau, Alasdair Newson
- Abstract summary: Diffusion models have shown impressive results in modeling complex data distributions, including images and videos.
We show that in the case of video inpainting, thanks to the highly auto-similar nature of videos, the training of a diffusion model can be restricted to the video to inpaint.
We call our approach "Infusion": an internal learning algorithm for video inpainting through diffusion.
- Score: 4.8201607588546
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Video inpainting is the task of filling a desired region in a video in a
visually convincing manner. It is a very challenging task due to the high
dimensionality of the signal and the temporal consistency required for
obtaining convincing results. Recently, diffusion models have shown impressive
results in modeling complex data distributions, including images and videos.
Diffusion models remain nonetheless very expensive to train and perform
inference with, which strongly restrict their application to video. We show
that in the case of video inpainting, thanks to the highly auto-similar nature
of videos, the training of a diffusion model can be restricted to the video to
inpaint and still produce very satisfying results. This leads us to adopt an
internal learning approch, which also allows for a greatly reduced network
size. We call our approach "Infusion": an internal learning algorithm for video
inpainting through diffusion. Due to our frugal network, we are able to propose
the first video inpainting approach based purely on diffusion. Other methods
require supporting elements such as optical flow estimation, which limits their
performance in the case of dynamic textures for example. We introduce a new
method for efficient training and inference of diffusion models in the context
of internal learning. We split the diffusion process into different learning
intervals which greatly simplifies the learning steps. We show qualititative
and quantitative results, demonstrating that our method reaches
state-of-the-art performance, in particular in the case of dynamic backgrounds
and textures.
Related papers
- ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation [81.90265212988844]
We propose a training-free video method for generative video models in a plug-and-play manner.
We transform a video model into a self-cascaded video diffusion model with the designed hidden state correction modules.
Our training-free method is even comparable to trained models supported by huge compute resources and large-scale datasets.
arXiv Detail & Related papers (2024-06-03T00:31:13Z) - Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation [44.92712228326116]
Video outpainting is a challenging task, aiming at generating video content outside the viewport of the input video.
We introduce MOTIA Mastering Video Outpainting Through Input-Specific Adaptation.
MoTIA comprises two main phases: input-specific adaptation and pattern-aware outpainting.
arXiv Detail & Related papers (2024-03-20T16:53:45Z) - Training-Free Semantic Video Composition via Pre-trained Diffusion Model [96.0168609879295]
Current approaches, predominantly trained on videos with adjusted foreground color and lighting, struggle to address deep semantic disparities beyond superficial adjustments.
We propose a training-free pipeline employing a pre-trained diffusion model imbued with semantic prior knowledge.
Experimental results reveal that our pipeline successfully ensures the visual harmony and inter-frame coherence of the outputs.
arXiv Detail & Related papers (2024-01-17T13:07:22Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention
and Text Guidance [73.19191296296988]
We propose a high-fidelity image-to-video generation method by devising a frame retention branch based on a pre-trained video diffusion model, named DreamVideo.
Our model has a powerful image retention ability and delivers the best results in UCF101 compared to other image-to-video models to our best knowledge.
arXiv Detail & Related papers (2023-12-05T03:16:31Z) - Flow-Guided Diffusion for Video Inpainting [15.478104117672803]
Video inpainting has been challenged by complex scenarios like large movements and low-light conditions.
Current methods, including emerging diffusion models, face limitations in quality and efficiency.
This paper introduces the Flow-Guided Diffusion model for Video Inpainting (FGDVI), a novel approach that significantly enhances temporal consistency and inpainting quality.
arXiv Detail & Related papers (2023-11-26T17:48:48Z) - Harvest Video Foundation Models via Efficient Post-Pretraining [67.30842563833185]
We propose an efficient framework to harvest video foundation models from image ones.
Our method is intuitively simple by randomly dropping input video patches and masking out input text during the post-pretraining procedure.
Our method achieves state-of-the-art performances, which are comparable to some heavily pretrained video foundation models.
arXiv Detail & Related papers (2023-10-30T14:06:16Z) - Hierarchical Masked 3D Diffusion Model for Video Outpainting [20.738731220322176]
We introduce a masked 3D diffusion model for video outpainting.
This allows us to use multiple guide frames to connect the results of multiple video clip inferences.
We also introduce a hybrid coarse-to-fine inference pipeline to alleviate the artifact accumulation problem.
arXiv Detail & Related papers (2023-09-05T10:52:21Z) - Diffusion Models as Masked Autoencoders [52.442717717898056]
We revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models.
While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE)
We perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.
arXiv Detail & Related papers (2023-04-06T17:59:56Z) - SinFusion: Training Diffusion Models on a Single Image or Video [11.473177123332281]
Diffusion models exhibited tremendous progress in image and video generation, exceeding GANs in quality and diversity.
In this paper we show how this can be resolved by training a diffusion model on a single input image or video.
Our image/video-specific diffusion model (SinFusion) learns the appearance and dynamics of the single image or video, while utilizing the conditioning capabilities of diffusion models.
arXiv Detail & Related papers (2022-11-21T18:59:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.