Enhance-A-Video: Better Generated Video for Free
- URL: http://arxiv.org/abs/2502.07508v2
- Date: Thu, 13 Feb 2025 15:28:13 GMT
- Title: Enhance-A-Video: Better Generated Video for Free
- Authors: Yang Luo, Xuanlei Zhao, Mengzhao Chen, Kaipeng Zhang, Wenqi Shao, Kai Wang, Zhangyang Wang, Yang You,
- Abstract summary: We introduce a training-free approach to enhance the coherence and quality of DiT-based generated videos.
Our approach can be easily applied to most DiT-based video generation frameworks without any retraining or fine-tuning.
- Score: 57.620595159855064
- License:
- Abstract: DiT-based video generation has achieved remarkable results, but research into enhancing existing models remains relatively unexplored. In this work, we introduce a training-free approach to enhance the coherence and quality of DiT-based generated videos, named Enhance-A-Video. The core idea is enhancing the cross-frame correlations based on non-diagonal temporal attention distributions. Thanks to its simple design, our approach can be easily applied to most DiT-based video generation frameworks without any retraining or fine-tuning. Across various DiT-based video generation models, our approach demonstrates promising improvements in both temporal consistency and visual quality. We hope this research can inspire future explorations in video generation enhancement.
Related papers
- Improving Video Generation with Human Feedback [81.48120703718774]
Video generation has achieved significant advances, but issues like unsmooth motion and misalignment between videos and prompts persist.
We develop a systematic pipeline that harnesses human feedback to mitigate these problems and refine the video generation model.
We introduce VideoReward, a multi-dimensional video reward model, and examine how annotations and various design choices impact its rewarding efficacy.
arXiv Detail & Related papers (2025-01-23T18:55:41Z) - OnlineVPO: Align Video Diffusion Model with Online Video-Centric Preference Optimization [30.6130504613716]
We introduce OnlineVPO, a preference learning approach tailored specifically for video diffusion models.
By employing the video reward model to offer concise video feedback on the fly, OnlineVPO offers effective and efficient preference guidance.
arXiv Detail & Related papers (2024-12-19T18:34:50Z) - The Dawn of Video Generation: Preliminary Explorations with SORA-like Models [14.528428430884015]
High-quality video generation, encompassing text-to-video (T2V), image-to-video (I2V), and video-to-video (V2V) generation, holds considerable significance in content creation.
Models like SORA have advanced generating videos with higher resolution, more natural motion, better vision-language alignment, and increased controllability.
arXiv Detail & Related papers (2024-10-07T17:35:10Z) - PEEKABOO: Interactive Video Generation via Masked-Diffusion [16.27046318032809]
We introduce first solution to equip module-based video generation models with video control.
We present Peekaboo, which integrates seamlessly with current video generation models offering control without the need for additional training or inference overhead.
Our extensive qualitative and quantitative assessments reveal that Peekaboo achieves up to a 3.8x improvement in mIoU over baseline models.
arXiv Detail & Related papers (2023-12-12T18:43:05Z) - Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large
Datasets [36.95521842177614]
We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation.
We identify and evaluate three different stages for successful training of video LDMs: text-to-image pretraining, video pretraining, and high-quality video finetuning.
arXiv Detail & Related papers (2023-11-25T22:28:38Z) - Control-A-Video: Controllable Text-to-Video Diffusion Models with Motion Prior and Reward Feedback Learning [50.60891619269651]
Control-A-Video is a controllable T2V diffusion model that can generate videos conditioned on text prompts and reference control maps like edge and depth maps.
We propose novel strategies to incorporate content prior and motion prior into the diffusion-based generation process.
Our framework generates higher-quality, more consistent videos compared to existing state-of-the-art methods in controllable text-to-video generation.
arXiv Detail & Related papers (2023-05-23T09:03:19Z) - Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation [55.36617538438858]
We propose a novel approach that strengthens the interaction between spatial and temporal perceptions.
We curate a large-scale and open-source video dataset called HD-VG-130M.
arXiv Detail & Related papers (2023-05-18T11:06:15Z) - Video Diffusion Models [47.99413440461512]
Generating temporally coherent high fidelity video is an important milestone in generative modeling research.
We propose a diffusion model for video generation that shows very promising initial results.
We present the first results on a large text-conditioned video generation task, as well as state-of-the-art results on an established unconditional video generation benchmark.
arXiv Detail & Related papers (2022-04-07T14:08:02Z) - Non-Adversarial Video Synthesis with Learned Priors [53.26777815740381]
We focus on the problem of generating videos from latent noise vectors, without any reference input frames.
We develop a novel approach that jointly optimize the input latent space, the weights of a recurrent neural network and a generator through non-adversarial learning.
Our approach generates superior quality videos compared to the existing state-of-the-art methods.
arXiv Detail & Related papers (2020-03-21T02:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.