Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis
- URL: http://arxiv.org/abs/2312.13834v1
- Date: Wed, 20 Dec 2023 01:49:47 GMT
- Title: Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis
- Authors: Bichen Wu, Ching-Yao Chuang, Xiaoyan Wang, Yichen Jia, Kapil
Krishnakumar, Tong Xiao, Feng Liang, Licheng Yu, Peter Vajda
- Abstract summary: We introduce Fairy, a minimalist yet robust adaptation of image-editing diffusion models, enhancing them for video editing applications.
Our approach centers on the concept of anchor-based cross-frame attention, a mechanism that implicitly propagates diffusion features across frames.
A comprehensive user study, involving 1000 generated samples, confirms that our approach delivers superior quality, decisively outperforming established methods.
- Score: 51.44526084095757
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce Fairy, a minimalist yet robust adaptation of
image-editing diffusion models, enhancing them for video editing applications.
Our approach centers on the concept of anchor-based cross-frame attention, a
mechanism that implicitly propagates diffusion features across frames, ensuring
superior temporal coherence and high-fidelity synthesis. Fairy not only
addresses limitations of previous models, including memory and processing
speed. It also improves temporal consistency through a unique data augmentation
strategy. This strategy renders the model equivariant to affine transformations
in both source and target images. Remarkably efficient, Fairy generates
120-frame 512x384 videos (4-second duration at 30 FPS) in just 14 seconds,
outpacing prior works by at least 44x. A comprehensive user study, involving
1000 generated samples, confirms that our approach delivers superior quality,
decisively outperforming established methods.
Related papers
- Make a Cheap Scaling: A Self-Cascade Diffusion Model for
Higher-Resolution Adaptation [112.08287900261898]
This paper proposes a novel self-cascade diffusion model for rapid adaptation to higher-resolution image and video generation.
Our approach achieves a 5X training speed-up and requires only an additional 0.002M tuning parameters.
Experiments demonstrate that our approach can quickly adapt to higher resolution image and video synthesis by fine-tuning for just 10k steps, with virtually no additional inference time.
arXiv Detail & Related papers (2024-02-16T07:48:35Z) - AnimateLCM: Accelerating the Animation of Personalized Diffusion Models
and Adapters with Decoupled Consistency Learning [47.681633892135125]
We propose AnimateLCM, allowing for high-fidelity video generation within minimal steps.
Instead of directly conducting consistency learning on the raw video dataset, we propose a decoupled consistency learning strategy.
We validate the proposed strategy in image-conditioned video generation and layout-conditioned video generation, all achieving top-performing results.
arXiv Detail & Related papers (2024-02-01T16:58:11Z) - FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video
Synthesis [66.2611385251157]
Diffusion models have transformed the image-to-image (I2I) synthesis and are now permeating into videos.
This paper proposes a consistent V2V synthesis framework by jointly leveraging spatial conditions and temporal optical flow clues within the source video.
arXiv Detail & Related papers (2023-12-29T16:57:12Z) - Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World
Video Super-Resolution [65.91317390645163]
Upscale-A-Video is a text-guided latent diffusion framework for video upscaling.
It ensures temporal coherence through two key mechanisms: locally, it integrates temporal layers into U-Net and VAE-Decoder, maintaining consistency within short sequences.
It also offers greater flexibility by allowing text prompts to guide texture creation and adjustable noise levels to balance restoration and generation.
arXiv Detail & Related papers (2023-12-11T18:54:52Z) - Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation [93.18163456287164]
This paper proposes a novel text-guided video-to-video translation framework to adapt image models to videos.
Our framework achieves global style and local texture temporal consistency at a low cost.
arXiv Detail & Related papers (2023-06-13T17:52:23Z) - Streaming Radiance Fields for 3D Video Synthesis [32.856346090347174]
We present an explicit-grid based method for reconstructing streaming radiance fields for novel view synthesis of real world dynamic scenes.
Experiments on challenging video sequences demonstrate that our approach is capable of achieving a training speed of 15 seconds per-frame with competitive rendering quality.
arXiv Detail & Related papers (2022-10-26T16:23:02Z) - Video Diffusion Models [47.99413440461512]
Generating temporally coherent high fidelity video is an important milestone in generative modeling research.
We propose a diffusion model for video generation that shows very promising initial results.
We present the first results on a large text-conditioned video generation task, as well as state-of-the-art results on an established unconditional video generation benchmark.
arXiv Detail & Related papers (2022-04-07T14:08:02Z) - OCSampler: Compressing Videos to One Clip with Single-step Sampling [82.0417131211353]
We propose a framework named OCSampler to explore a compact yet effective video representation with one short clip.
Our basic motivation is that the efficient video recognition task lies in processing a whole sequence at once rather than picking up frames sequentially.
arXiv Detail & Related papers (2022-01-12T09:50:38Z) - Robust High-Resolution Video Matting with Temporal Guidance [14.9739044990367]
We introduce a robust, real-time, high-resolution human video matting method that achieves new state-of-the-art performance.
Our method is much lighter than previous approaches and can process 4K at 76 FPS and HD at 104 FPS on an Nvidia GTX 1080Ti GPU.
arXiv Detail & Related papers (2021-08-25T23:48:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.