Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models
- URL: http://arxiv.org/abs/2407.10285v1
- Date: Sun, 14 Jul 2024 17:59:56 GMT
- Title: Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models
- Authors: Qinyu Yang, Haoxin Chen, Yong Zhang, Menghan Xia, Xiaodong Cun, Zhixun Su, Ying Shan,
- Abstract summary: We propose a novel formulation that considers both visual quality and consistency of content.
Consistency of content is ensured by a proposed loss function that maintains the structure of the input, while visual quality is improved by utilizing the denoising process of pretrained diffusion models.
- Score: 47.518487213173785
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In order to improve the quality of synthesized videos, currently, one predominant method involves retraining an expert diffusion model and then implementing a noising-denoising process for refinement. Despite the significant training costs, maintaining consistency of content between the original and enhanced videos remains a major challenge. To tackle this challenge, we propose a novel formulation that considers both visual quality and consistency of content. Consistency of content is ensured by a proposed loss function that maintains the structure of the input, while visual quality is improved by utilizing the denoising process of pretrained diffusion models. To address the formulated optimization problem, we have developed a plug-and-play noise optimization strategy, referred to as Noise Calibration. By refining the initial random noise through a few iterations, the content of original video can be largely preserved, and the enhancement effect demonstrates a notable improvement. Extensive experiments have demonstrated the effectiveness of the proposed method.
Related papers
- NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing [3.6344789837383145]
We propose a video editing framework, NaRCan, which integrates a hybrid deformation field and diffusion prior to generate high-quality natural canonical images.
Our approach utilizes homography to model global motion and employs multi-layer perceptrons (MLPs) to capture local residual deformations.
Our method outperforms existing approaches in various video editing tasks and produces coherent and high-quality edited video sequences.
arXiv Detail & Related papers (2024-06-10T17:59:46Z) - Compression-Realized Deep Structural Network for Video Quality Enhancement [78.13020206633524]
This paper focuses on the task of quality enhancement for compressed videos.
A new paradigm is urgently needed for a more "conscious" process of quality enhancement.
We propose the Compression-Realize Deep Structural Network (CRDS), introducing three inductive biases aligned with the three primary processes in the classic compression domain.
arXiv Detail & Related papers (2024-05-10T09:18:17Z) - Training-Free Semantic Video Composition via Pre-trained Diffusion Model [96.0168609879295]
Current approaches, predominantly trained on videos with adjusted foreground color and lighting, struggle to address deep semantic disparities beyond superficial adjustments.
We propose a training-free pipeline employing a pre-trained diffusion model imbued with semantic prior knowledge.
Experimental results reveal that our pipeline successfully ensures the visual harmony and inter-frame coherence of the outputs.
arXiv Detail & Related papers (2024-01-17T13:07:22Z) - InstructVideo: Instructing Video Diffusion Models with Human Feedback [65.9590462317474]
We propose InstructVideo to instruct text-to-video diffusion models with human feedback by reward fine-tuning.
InstructVideo has two key ingredients: 1) To ameliorate the cost of reward fine-tuning induced by generating through the full DDIM sampling chain, we recast reward fine-tuning as editing.
arXiv Detail & Related papers (2023-12-19T17:55:16Z) - FreeInit: Bridging Initialization Gap in Video Diffusion Models [42.38240625514987]
FreeInit is able to compensate the gap between training and inference, thus effectively improving the subject appearance and temporal consistency of generation results.
Experiments demonstrate that FreeInit consistently enhances the generation quality of various text-to-video diffusion models without additional training or fine-tuning.
arXiv Detail & Related papers (2023-12-12T18:59:16Z) - Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style
Transfer [13.098901971644656]
This paper proposes a zero-shot video stylization method named Style-A-Video.
Uses a generative pre-trained transformer with an image latent diffusion model to achieve a concise text-controlled video stylization.
Tests show that we can attain superior content preservation and stylistic performance while incurring less consumption than previous solutions.
arXiv Detail & Related papers (2023-05-09T14:03:27Z) - VideoFusion: Decomposed Diffusion Models for High-Quality Video
Generation [88.49030739715701]
This work presents a decomposed diffusion process via resolving the per-frame noise into a base noise that is shared among all frames and a residual noise that varies along the time axis.
Experiments on various datasets confirm that our approach, termed as VideoFusion, surpasses both GAN-based and diffusion-based alternatives in high-quality video generation.
arXiv Detail & Related papers (2023-03-15T02:16:39Z) - Encoding in the Dark Grand Challenge: An Overview [60.9261003831389]
We propose a Grand Challenge on encoding low-light video sequences.
VVC achieves a high performance compared to simply denoising the video source prior to encoding.
The quality of the video streams can be further improved by employing a post-processing image enhancement method.
arXiv Detail & Related papers (2020-05-07T08:22:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.