Related papers: Infusion: internal diffusion for inpainting of dynamic textures and complex motion

Infusion: internal diffusion for inpainting of dynamic textures and complex motion

URL: http://arxiv.org/abs/2311.01090v4
Date: Mon, 28 Apr 2025 07:47:49 GMT
Title: Infusion: internal diffusion for inpainting of dynamic textures and complex motion
Authors: Nicolas Cherel, Andrés Almansa, Yann Gousseau, Alasdair Newson,
Abstract summary: Video inpainting is the task of filling a region in a video in a visually convincing manner.<n> diffusion models have shown impressive results in modeling complex data distributions, including images and videos.<n>We show that in the case of video inpainting, thanks to the highly auto-similar nature of videos, the training data of a diffusion model can be restricted to the input video and still produce satisfying results.
Score: 4.912318087940015
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Video inpainting is the task of filling a region in a video in a visually convincing manner. It is very challenging due to the high dimensionality of the data and the temporal consistency required for obtaining convincing results. Recently, diffusion models have shown impressive results in modeling complex data distributions, including images and videos. Such models remain nonetheless very expensive to train and to perform inference with, which strongly reduce their applicability to videos, and yields unreasonable computational loads. We show that in the case of video inpainting, thanks to the highly auto-similar nature of videos, the training data of a diffusion model can be restricted to the input video and still produce very satisfying results. With this internal learning approach, where the training data is limited to a single video, our lightweight models perform very well with only half a million parameters, in contrast to the very large networks with billions of parameters typically found in the literature. We also introduce a new method for efficient training and inference of diffusion models in the context of internal learning, by splitting the diffusion process into different learning intervals corresponding to different noise levels of the diffusion process. We show qualitative and quantitative results, demonstrating that our method reaches or exceeds state of the art performance in the case of dynamic textures and complex dynamic backgrounds

Related papers

AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset [55.82208863521353]
We propose AccVideo to reduce the inference steps for accelerating video diffusion models with synthetic dataset. Our model achieves 8.5x improvements in generation speed compared to the teacher model. Compared to previous accelerating methods, our approach is capable of generating videos with higher quality and resolution.
arXiv Detail & Related papers (2025-03-25T08:52:07Z)
From Image to Video: An Empirical Study of Diffusion Representations [37.09795196423048]
Diffusion models have revolutionized generative modeling, enabling unprecedented realism in image and video synthesis. This work marks the first direct comparison of video and image diffusion objectives for visual understanding, offering insights into the role of temporal information in representation learning.
arXiv Detail & Related papers (2025-02-10T19:53:46Z)
DiffusionTrend: A Minimalist Approach to Virtual Fashion Try-On [103.89972383310715]
DiffusionTrend harnesses latent information rich in prior information to capture the nuances of garment details. It delivers a visually compelling try-on experience, underscoring the potential of training-free diffusion model.
arXiv Detail & Related papers (2024-12-19T02:24:35Z)
Accelerating Video Diffusion Models via Distribution Matching [26.475459912686986]
This work introduces a novel framework for diffusion distillation and distribution matching. Our approach focuses on distilling pre-trained diffusion models into a more efficient few-step generator. By leveraging a combination of video GAN loss and a novel 2D score distribution matching loss, we demonstrate the potential to generate high-quality video frames.
arXiv Detail & Related papers (2024-12-08T11:36:32Z)
DIVD: Deblurring with Improved Video Diffusion Model [8.816046910904488]
Diffusion models and video diffusion models have excelled in the fields of image and video generation. We introduce a video diffusion model specifically for the task of video deblurring. Our model outperforms existing models and achieves state-of-the-art results on a range of perceptual metrics.
arXiv Detail & Related papers (2024-12-01T11:39:02Z)
Optical-Flow Guided Prompt Optimization for Coherent Video Generation [51.430833518070145]
We propose a framework called MotionPrompt that guides the video generation process via optical flow. We optimize learnable token embeddings during reverse sampling steps by using gradients from a trained discriminator applied to random frame pairs. This approach allows our method to generate visually coherent video sequences that closely reflect natural motion dynamics, without compromising the fidelity of the generated content.
arXiv Detail & Related papers (2024-11-23T12:26:52Z)
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think [72.48325960659822]
One main bottleneck in training large-scale diffusion models for generation lies in effectively learning these representations. We study this by introducing a straightforward regularization called REPresentation Alignment (REPA), which aligns the projections of noisy input hidden states in denoising networks with clean image representations obtained from external, pretrained visual encoders. The results are striking: our simple strategy yields significant improvements in both training efficiency and generation quality when applied to popular diffusion and flow-based transformers, such as DiTs and SiTs.
arXiv Detail & Related papers (2024-10-09T14:34:53Z)
Noise Crystallization and Liquid Noise: Zero-shot Video Generation using Image Diffusion Models [6.408114351192012]
Video models require extensive training and computational resources, leading to high costs and large environmental impacts. This paper introduces a novel approach to video generation by augmenting image diffusion models to create sequential animation frames while maintaining fine detail.
arXiv Detail & Related papers (2024-10-05T12:53:05Z)
Image Neural Field Diffusion Models [46.781775067944395]
We propose to learn the distribution of continuous images by training diffusion models on image neural fields. We show that image neural field diffusion models can be trained using mixed-resolution image datasets, outperform fixed-resolution diffusion models, and can solve inverse problems with conditions applied at different scales efficiently.
arXiv Detail & Related papers (2024-06-11T17:24:02Z)
SF-V: Single Forward Video Generation Model [57.292575082410785]
We propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune pre-trained models. Experiments demonstrate that our method achieves competitive generation quality of synthesized videos with significantly reduced computational overhead.
arXiv Detail & Related papers (2024-06-06T17:58:27Z)
Plug-and-Play Diffusion Distillation [14.359953671470242]
We propose a new distillation approach for guided diffusion models. An external lightweight guide model is trained while the original text-to-image model remains frozen. We show that our method reduces the inference of classifier-free guided latent-space diffusion models by almost half.
arXiv Detail & Related papers (2024-06-04T04:22:47Z)
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation [81.90265212988844]
We propose a training-free video method for generative video models in a plug-and-play manner. We transform a video model into a self-cascaded video diffusion model with the designed hidden state correction modules. Our training-free method is even comparable to trained models supported by huge compute resources and large-scale datasets.
arXiv Detail & Related papers (2024-06-03T00:31:13Z)
Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining. We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z)
Diffusion Models as Masked Autoencoders [52.442717717898056]
We revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models. While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE) We perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.
arXiv Detail & Related papers (2023-04-06T17:59:56Z)
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation [88.49030739715701]
This work presents a decomposed diffusion process via resolving the per-frame noise into a base noise that is shared among all frames and a residual noise that varies along the time axis. Experiments on various datasets confirm that our approach, termed as VideoFusion, surpasses both GAN-based and diffusion-based alternatives in high-quality video generation.
arXiv Detail & Related papers (2023-03-15T02:16:39Z)
SinFusion: Training Diffusion Models on a Single Image or Video [11.473177123332281]
Diffusion models exhibited tremendous progress in image and video generation, exceeding GANs in quality and diversity. In this paper we show how this can be resolved by training a diffusion model on a single input image or video. Our image/video-specific diffusion model (SinFusion) learns the appearance and dynamics of the single image or video, while utilizing the conditioning capabilities of diffusion models.
arXiv Detail & Related papers (2022-11-21T18:59:33Z)
Diffusion Models in Vision: A Survey [80.82832715884597]
A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and a reverse diffusion stage. Diffusion models are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens.
arXiv Detail & Related papers (2022-09-10T22:00:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.