Generative Inbetweening through Frame-wise Conditions-Driven Video Generation
- URL: http://arxiv.org/abs/2412.11755v1
- Date: Mon, 16 Dec 2024 13:19:41 GMT
- Title: Generative Inbetweening through Frame-wise Conditions-Driven Video Generation
- Authors: Tianyi Zhu, Dongwei Ren, Qilong Wang, Xiaohe Wu, Wangmeng Zuo,
- Abstract summary: generative inbetweening aims to generate intermediate frame sequences by utilizing two key frames as input.
We propose a Frame-wise Conditions-driven Video Generation (FCVG) method that significantly enhances the temporal stability of interpolated video frames.
Our FCVG demonstrates the capability to generate temporally stable videos using both linear and non-linear curves.
- Score: 63.43583844248389
- License:
- Abstract: Generative inbetweening aims to generate intermediate frame sequences by utilizing two key frames as input. Although remarkable progress has been made in video generation models, generative inbetweening still faces challenges in maintaining temporal stability due to the ambiguous interpolation path between two key frames. This issue becomes particularly severe when there is a large motion gap between input frames. In this paper, we propose a straightforward yet highly effective Frame-wise Conditions-driven Video Generation (FCVG) method that significantly enhances the temporal stability of interpolated video frames. Specifically, our FCVG provides an explicit condition for each frame, making it much easier to identify the interpolation path between two input frames and thus ensuring temporally stable production of visually plausible video frames. To achieve this, we suggest extracting matched lines from two input frames that can then be easily interpolated frame by frame, serving as frame-wise conditions seamlessly integrated into existing video generation models. In extensive evaluations covering diverse scenarios such as natural landscapes, complex human poses, camera movements and animations, existing methods often exhibit incoherent transitions across frames. In contrast, our FCVG demonstrates the capability to generate temporally stable videos using both linear and non-linear interpolation curves. Our project page and code are available at \url{https://fcvg-inbetween.github.io/}.
Related papers
- ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler [53.98558445900626]
Current image-to-video diffusion models, while powerful in generating videos from a single frame, need adaptation for two-frame conditioned generation.
We introduce a novel, bidirectional sampling strategy to address these off-manifold issues without requiring extensive re-noising or fine-tuning.
Our method employs sequential sampling along both forward and backward paths, conditioned on the start and end frames, respectively, ensuring more coherent and on-manifold generation of intermediate frames.
arXiv Detail & Related papers (2024-10-08T03:01:54Z) - RIGID: Recurrent GAN Inversion and Editing of Real Face Videos [73.97520691413006]
GAN inversion is indispensable for applying the powerful editability of GAN to real images.
Existing methods invert video frames individually often leading to undesired inconsistent results over time.
We propose a unified recurrent framework, named textbfRecurrent vtextbfIdeo textbfGAN textbfInversion and etextbfDiting (RIGID)
Our framework learns the inherent coherence between input frames in an end-to-end manner.
arXiv Detail & Related papers (2023-08-11T12:17:24Z) - Towards Smooth Video Composition [59.134911550142455]
Video generation requires consistent and persistent frames with dynamic content over time.
This work investigates modeling the temporal relations for composing video with arbitrary length, from a few frames to even infinite, using generative adversarial networks (GANs)
We show that the alias-free operation for single image generation, together with adequately pre-learned knowledge, brings a smooth frame transition without compromising the per-frame quality.
arXiv Detail & Related papers (2022-12-14T18:54:13Z) - TTVFI: Learning Trajectory-Aware Transformer for Video Frame
Interpolation [50.49396123016185]
Video frame (VFI) aims to synthesize an intermediate frame between two consecutive frames.
We propose a novel Trajectory-aware Transformer for Video Frame Interpolation (TTVFI)
Our method outperforms other state-of-the-art methods in four widely-used VFI benchmarks.
arXiv Detail & Related papers (2022-07-19T03:37:49Z) - Video Frame Interpolation without Temporal Priors [91.04877640089053]
Video frame aims to synthesize non-exist intermediate frames in a video sequence.
The temporal priors of videos, i.e. frames per second (FPS) and frame exposure time, may vary from different camera sensors.
We devise a novel optical flow refinement strategy for better synthesizing results.
arXiv Detail & Related papers (2021-12-02T12:13:56Z) - ALANET: Adaptive Latent Attention Network forJoint Video Deblurring and
Interpolation [38.52446103418748]
We introduce a novel architecture, Adaptive Latent Attention Network (ALANET), which synthesizes sharp high frame-rate videos.
We employ combination of self-attention and cross-attention module between consecutive frames in the latent space to generate optimized representation for each frame.
Our method performs favorably against various state-of-the-art approaches, even though we tackle a much more difficult problem.
arXiv Detail & Related papers (2020-08-31T21:11:53Z) - Deep Sketch-guided Cartoon Video Inbetweening [24.00033622396297]
We propose a framework to produce cartoon videos by fetching the color information from two inputs while following the animated motion guided by a user sketch.
By explicitly considering the correspondence between frames and the sketch, we can achieve higher quality results than other image synthesis methods.
arXiv Detail & Related papers (2020-08-10T14:22:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.