Tuning-Free Noise Rectification for High Fidelity Image-to-Video
Generation
- URL: http://arxiv.org/abs/2403.02827v1
- Date: Tue, 5 Mar 2024 09:57:47 GMT
- Title: Tuning-Free Noise Rectification for High Fidelity Image-to-Video
Generation
- Authors: Weijie Li, Litong Gong, Yiran Zhu, Fanda Fan, Biao Wang, Tiezheng Ge,
Bo Zheng
- Abstract summary: Image-to-video (I2V) generation tasks always suffer from keeping high fidelity in the open domains.
Several recent I2V frameworks can generate dynamic content for open domain images but fail to maintain fidelity.
We propose an effective method that can be applied to mainstream video diffusion models.
- Score: 23.81997037880116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image-to-video (I2V) generation tasks always suffer from keeping high
fidelity in the open domains. Traditional image animation techniques primarily
focus on specific domains such as faces or human poses, making them difficult
to generalize to open domains. Several recent I2V frameworks based on diffusion
models can generate dynamic content for open domain images but fail to maintain
fidelity. We found that two main factors of low fidelity are the loss of image
details and the noise prediction biases during the denoising process. To this
end, we propose an effective method that can be applied to mainstream video
diffusion models. This method achieves high fidelity based on supplementing
more precise image information and noise rectification. Specifically, given a
specified image, our method first adds noise to the input image latent to keep
more details, then denoises the noisy latent with proper rectification to
alleviate the noise prediction biases. Our method is tuning-free and
plug-and-play. The experimental results demonstrate the effectiveness of our
approach in improving the fidelity of generated videos. For more image-to-video
generated results, please refer to the project website:
https://noise-rectification.github.io.
Related papers
- Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model [31.70050311326183]
I2V diffusion models (I2V-DMs) tend to over-rely on the conditional image at large time steps, neglecting the crucial task of predicting the clean video from noisy inputs.
We introduce a training-free inference strategy that starts the generation process from an earlier time step to avoid the unreliable late-time steps of I2V-DMs.
We design a time-dependent noise distribution for the conditional image, which favors high noise levels at large time steps to sufficiently interfere with the conditional image.
arXiv Detail & Related papers (2024-06-22T04:56:16Z) - TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models [94.24861019513462]
TRIP is a new recipe of image-to-video diffusion paradigm.
It pivots on image noise prior derived from static image to jointly trigger inter-frame relational reasoning.
Extensive experiments on WebVid-10M, DTDB and MSR-VTT datasets demonstrate TRIP's effectiveness.
arXiv Detail & Related papers (2024-03-25T17:59:40Z) - Real-World Denoising via Diffusion Model [14.722529440511446]
Real-world image denoising aims to recover clean images from noisy images captured in natural environments.
diffusion models have achieved very promising results in the field of image generation, outperforming previous generation models.
This paper proposes a novel general denoising diffusion model that can be used for real-world image denoising.
arXiv Detail & Related papers (2023-05-08T04:48:03Z) - Masked Image Training for Generalizable Deep Image Denoising [53.03126421917465]
We present a novel approach to enhance the generalization performance of denoising networks.
Our method involves masking random pixels of the input image and reconstructing the missing information during training.
Our approach exhibits better generalization ability than other deep learning models and is directly applicable to real-world scenarios.
arXiv Detail & Related papers (2023-03-23T09:33:44Z) - Diffusion Model for Generative Image Denoising [17.897180118637856]
In supervised learning for image denoising, usually the paired clean images and noisy images are collected and synthesised to train a denoising model.
In this paper, we regard the denoising task as a problem of estimating the posterior distribution of clean images conditioned on noisy images.
arXiv Detail & Related papers (2023-02-05T14:53:07Z) - Uncovering the Disentanglement Capability in Text-to-Image Diffusion
Models [60.63556257324894]
A key desired property of image generative models is the ability to disentangle different attributes.
We propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation.
Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms.
arXiv Detail & Related papers (2022-12-16T19:58:52Z) - Learning to Generate Realistic Noisy Images via Pixel-level Noise-aware
Adversarial Training [50.018580462619425]
We propose a novel framework, namely Pixel-level Noise-aware Generative Adrial Network (PNGAN)
PNGAN employs a pre-trained real denoiser to map the fake and real noisy images into a nearly noise-free solution space.
For better noise fitting, we present an efficient architecture Simple Multi-versa-scale Network (SMNet) as the generator.
arXiv Detail & Related papers (2022-04-06T14:09:02Z) - Dynamic Dual-Output Diffusion Models [100.32273175423146]
Iterative denoising-based generation has been shown to be comparable in quality to other classes of generative models.
A major drawback of this method is that it requires hundreds of iterations to produce a competitive result.
Recent works have proposed solutions that allow for faster generation with fewer iterations, but the image quality gradually deteriorates.
arXiv Detail & Related papers (2022-03-08T11:20:40Z) - Dual Adversarial Network: Toward Real-world Noise Removal and Noise
Generation [52.75909685172843]
Real-world image noise removal is a long-standing yet very challenging task in computer vision.
We propose a novel unified framework to deal with the noise removal and noise generation tasks.
Our method learns the joint distribution of the clean-noisy image pairs.
arXiv Detail & Related papers (2020-07-12T09:16:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.