Related papers: Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation

Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation

URL: http://arxiv.org/abs/2403.02827v1
Date: Tue, 5 Mar 2024 09:57:47 GMT
Title: Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation
Authors: Weijie Li, Litong Gong, Yiran Zhu, Fanda Fan, Biao Wang, Tiezheng Ge, Bo Zheng
Abstract summary: Image-to-video (I2V) generation tasks always suffer from keeping high fidelity in the open domains. Several recent I2V frameworks can generate dynamic content for open domain images but fail to maintain fidelity. We propose an effective method that can be applied to mainstream video diffusion models.
Score: 23.81997037880116
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image-to-video (I2V) generation tasks always suffer from keeping high fidelity in the open domains. Traditional image animation techniques primarily focus on specific domains such as faces or human poses, making them difficult to generalize to open domains. Several recent I2V frameworks based on diffusion models can generate dynamic content for open domain images but fail to maintain fidelity. We found that two main factors of low fidelity are the loss of image details and the noise prediction biases during the denoising process. To this end, we propose an effective method that can be applied to mainstream video diffusion models. This method achieves high fidelity based on supplementing more precise image information and noise rectification. Specifically, given a specified image, our method first adds noise to the input image latent to keep more details, then denoises the noisy latent with proper rectification to alleviate the noise prediction biases. Our method is tuning-free and plug-and-play. The experimental results demonstrate the effectiveness of our approach in improving the fidelity of generated videos. For more image-to-video generated results, please refer to the project website: https://noise-rectification.github.io.

Related papers

How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models [7.89220773721457]
We propose a novel method for preserving temporal correlations in a sequence of noise samples. $int$-noise (integral noise) reinterprets individual noise samples as a continuously integrated noise field. $int$-noise can be used for a variety of tasks, such as video restoration, surrogate rendering, and conditional video generation.
arXiv Detail & Related papers (2025-04-03T22:49:56Z)
A Noise is Worth Diffusion Guidance [36.912490607355295]
Current diffusion models struggle to produce reliable images without guidance. We propose a novel method that replaces guidance methods with a single refinement of the initial noise. Our noise-refining model leverages efficient noise-space learning, achieving rapid convergence and strong performance with just 50K text-image pairs.
arXiv Detail & Related papers (2024-12-05T06:09:56Z)
FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process [120.91393949012014]
FreeEnhance is a framework for content-consistent image enhancement using off-the-shelf image diffusion models. In the noising stage, FreeEnhance is devised to add lighter noise to the region with higher frequency to preserve the high-frequent patterns in the original image. In the denoising stage, we present three target properties as constraints to regularize the predicted noise, enhancing images with high acutance and high visual quality.
arXiv Detail & Related papers (2024-09-11T17:58:50Z)
Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model [31.70050311326183]
Diffusion models tend to generate videos with less motion than expected. We address this issue from both inference and training aspects. Our methods outperform baselines by producing higher motion scores with lower errors.
arXiv Detail & Related papers (2024-06-22T04:56:16Z)
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models [94.24861019513462]
TRIP is a new recipe of image-to-video diffusion paradigm. It pivots on image noise prior derived from static image to jointly trigger inter-frame relational reasoning. Extensive experiments on WebVid-10M, DTDB and MSR-VTT datasets demonstrate TRIP's effectiveness.
arXiv Detail & Related papers (2024-03-25T17:59:40Z)
Real-World Denoising via Diffusion Model [14.722529440511446]
Real-world image denoising aims to recover clean images from noisy images captured in natural environments. diffusion models have achieved very promising results in the field of image generation, outperforming previous generation models. This paper proposes a novel general denoising diffusion model that can be used for real-world image denoising.
arXiv Detail & Related papers (2023-05-08T04:48:03Z)
Guided Image Synthesis via Initial Image Editing in Diffusion Model [30.622943615086584]
Diffusion models can generate high quality images by denoising pure Gaussian noise images. We propose a novel direction of manipulating the initial noise to control the generated image. Our results highlight the flexibility and power of initial image manipulation in controlling the generated image.
arXiv Detail & Related papers (2023-05-05T09:27:59Z)
Masked Image Training for Generalizable Deep Image Denoising [53.03126421917465]
We present a novel approach to enhance the generalization performance of denoising networks. Our method involves masking random pixels of the input image and reconstructing the missing information during training. Our approach exhibits better generalization ability than other deep learning models and is directly applicable to real-world scenarios.
arXiv Detail & Related papers (2023-03-23T09:33:44Z)
Diffusion Model for Generative Image Denoising [17.897180118637856]
In supervised learning for image denoising, usually the paired clean images and noisy images are collected and synthesised to train a denoising model. In this paper, we regard the denoising task as a problem of estimating the posterior distribution of clean images conditioned on noisy images.
arXiv Detail & Related papers (2023-02-05T14:53:07Z)
Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models [60.63556257324894]
A key desired property of image generative models is the ability to disentangle different attributes. We propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation. Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms.
arXiv Detail & Related papers (2022-12-16T19:58:52Z)
Dynamic Dual-Output Diffusion Models [100.32273175423146]
Iterative denoising-based generation has been shown to be comparable in quality to other classes of generative models. A major drawback of this method is that it requires hundreds of iterations to produce a competitive result. Recent works have proposed solutions that allow for faster generation with fewer iterations, but the image quality gradually deteriorates.
arXiv Detail & Related papers (2022-03-08T11:20:40Z)
Dual Adversarial Network: Toward Real-world Noise Removal and Noise Generation [52.75909685172843]
Real-world image noise removal is a long-standing yet very challenging task in computer vision. We propose a novel unified framework to deal with the noise removal and noise generation tasks. Our method learns the joint distribution of the clean-noisy image pairs.
arXiv Detail & Related papers (2020-07-12T09:16:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.