Related papers: UFO: Enhancing Diffusion-Based Video Generation with a Uniform Frame Organizer

UFO: Enhancing Diffusion-Based Video Generation with a Uniform Frame Organizer

URL: http://arxiv.org/abs/2412.09389v1
Date: Thu, 12 Dec 2024 15:56:26 GMT
Title: UFO: Enhancing Diffusion-Based Video Generation with a Uniform Frame Organizer
Authors: Delong Liu, Zhaohui Hou, Mingjie Zhan, Shihao Han, Zhicheng Zhao, Fei Su,
Abstract summary: We propose a non-invasive plug-in called Uniform Frame Organizer (UFO)<n>UFO is compatible with any diffusion-based video generation model.<n>The training for UFO is simple, efficient, requires minimal resources, and supports stylized training.
Score: 20.121885706650758
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, diffusion-based video generation models have achieved significant success. However, existing models often suffer from issues like weak consistency and declining image quality over time. To overcome these challenges, inspired by aesthetic principles, we propose a non-invasive plug-in called Uniform Frame Organizer (UFO), which is compatible with any diffusion-based video generation model. The UFO comprises a series of adaptive adapters with adjustable intensities, which can significantly enhance the consistency between the foreground and background of videos and improve image quality without altering the original model parameters when integrated. The training for UFO is simple, efficient, requires minimal resources, and supports stylized training. Its modular design allows for the combination of multiple UFOs, enabling the customization of personalized video generation models. Furthermore, the UFO also supports direct transferability across different models of the same specification without the need for specific retraining. The experimental results indicate that UFO effectively enhances video generation quality and demonstrates its superiority in public video generation benchmarks. The code will be publicly available at https://github.com/Delong-liu-bupt/UFO.

Related papers

SkyReels-V2: Infinite-length Film Generative Model [35.00453687783287]
We propose SkyReels-V2, an Infinite-length Film Generative Model, that synergizes Multi-modal Large Language Model (MLLM), Multi-stage Pretraining, Reinforcement Learning, and Diffusion Forcing Framework. We establish progressive-resolution pretraining for the fundamental video generation, followed by a four-stage post-training enhancement.
arXiv Detail & Related papers (2025-04-17T16:37:27Z)
Enhance-A-Video: Better Generated Video for Free [57.620595159855064]
We introduce a training-free approach to enhance the coherence and quality of DiT-based generated videos. Our approach can be easily applied to most DiT-based video generation frameworks without any retraining or fine-tuning.
arXiv Detail & Related papers (2025-02-11T12:22:35Z)
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation [81.90265212988844]
We propose a training-free video method for generative video models in a plug-and-play manner. We transform a video model into a self-cascaded video diffusion model with the designed hidden state correction modules. Our training-free method is even comparable to trained models supported by huge compute resources and large-scale datasets.
arXiv Detail & Related papers (2024-06-03T00:31:13Z)
AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data [45.20627288830823]
It reduces the necessary generation time of similarly sized video diffusion models from 25 seconds to around 1 second. The method's effectiveness lies in its dual-level decoupling learning approach.
arXiv Detail & Related papers (2024-02-01T16:58:11Z)
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models [76.85329896854189]
We investigate the feasibility of leveraging low-quality videos and synthesized high-quality images to obtain a high-quality video model. We shift the distribution to higher quality without motion degradation by finetuning spatial modules with high-quality images, resulting in a generic high-quality video model.
arXiv Detail & Related papers (2024-01-17T08:30:32Z)
Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions [94.03133100056372]
Moonshot is a new video generation model that conditions simultaneously on multimodal inputs of image and text. Model can be easily repurposed for a variety of generative applications, such as personalized video generation, image animation and video editing.
arXiv Detail & Related papers (2024-01-03T16:43:47Z)
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets [36.95521842177614]
We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation. We identify and evaluate three different stages for successful training of video LDMs: text-to-image pretraining, video pretraining, and high-quality video finetuning.
arXiv Detail & Related papers (2023-11-25T22:28:38Z)
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs [16.121569507866848]
We present UFOGen, a novel generative model designed for ultra-fast, one-step text-to-image synthesis. Unlike conventional approaches, UFOGen adopts a hybrid methodology, integrating diffusion models with a GAN objective. UFOGen excels in efficiently generating high-quality images conditioned on textual descriptions in a single step.
arXiv Detail & Related papers (2023-11-14T23:07:50Z)
Imagen Video: High Definition Video Generation with Diffusion Models [64.06483414521222]
Imagen Video is a text-conditional video generation system based on a cascade of video diffusion models. We find Imagen Video capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge.
arXiv Detail & Related papers (2022-10-05T14:41:38Z)
UFO: Unified Feature Optimization [67.77936811483664]
This paper proposes a novel Unified Feature Optimization (UFO) paradigm for training and deploying deep models. UFO aims to benefit each single task with a large-scale pretraining on all tasks. UFO provides great convenience for flexible deployment, while maintaining the benefits of large-scale pretraining.
arXiv Detail & Related papers (2022-07-21T07:34:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.