Vid-ODE: Continuous-Time Video Generation with Neural Ordinary
Differential Equation
- URL: http://arxiv.org/abs/2010.08188v2
- Date: Tue, 30 Mar 2021 13:17:23 GMT
- Title: Vid-ODE: Continuous-Time Video Generation with Neural Ordinary
Differential Equation
- Authors: Sunghyun Park, Kangyeol Kim, Junsoo Lee, Jaegul Choo, Joonseok Lee,
Sookyung Kim, Edward Choi
- Abstract summary: We propose continuous-time video generation by combining neural ODE (Vid-ODE) with pixel-level video processing techniques.
Vid-ODE is the first work successfully performing continuous-time video generation using real-world videos.
- Score: 42.85126020237214
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video generation models often operate under the assumption of fixed frame
rates, which leads to suboptimal performance when it comes to handling flexible
frame rates (e.g., increasing the frame rate of the more dynamic portion of the
video as well as handling missing video frames). To resolve the restricted
nature of existing video generation models' ability to handle arbitrary
timesteps, we propose continuous-time video generation by combining neural ODE
(Vid-ODE) with pixel-level video processing techniques. Using ODE-ConvGRU as an
encoder, a convolutional version of the recently proposed neural ODE, which
enables us to learn continuous-time dynamics, Vid-ODE can learn the
spatio-temporal dynamics of input videos of flexible frame rates. The decoder
integrates the learned dynamics function to synthesize video frames at any
given timesteps, where the pixel-level composition technique is used to
maintain the sharpness of individual frames. With extensive experiments on four
real-world video datasets, we verify that the proposed Vid-ODE outperforms
state-of-the-art approaches under various video generation settings, both
within the trained time range (interpolation) and beyond the range
(extrapolation). To the best of our knowledge, Vid-ODE is the first work
successfully performing continuous-time video generation using real-world
videos.
Related papers
- MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion [3.7270979204213446]
We present four key contributions to address the challenges of video processing.
First, we introduce the 3D Inverted Vector-Quantization Variencoenco Autocoder.
Second, we present MotionAura, a text-to-video generation framework.
Third, we propose a spectral transformer-based denoising network.
Fourth, we introduce a downstream task of Sketch Guided Videopainting.
arXiv Detail & Related papers (2024-10-10T07:07:56Z) - ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation [81.90265212988844]
We propose a training-free video method for generative video models in a plug-and-play manner.
We transform a video model into a self-cascaded video diffusion model with the designed hidden state correction modules.
Our training-free method is even comparable to trained models supported by huge compute resources and large-scale datasets.
arXiv Detail & Related papers (2024-06-03T00:31:13Z) - Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization [52.63845811751936]
Video pre-training is challenging due to the modeling of its dynamics video.
In this paper, we address such limitations in video pre-training with an efficient video decomposition.
Our framework is both capable of comprehending and generating image and video content, as demonstrated by its performance across 13 multimodal benchmarks.
arXiv Detail & Related papers (2024-02-05T16:30:49Z) - ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation [33.37279673304]
We introduce ConditionVideo, a training-free approach to text-to-video generation based on the provided condition, video, and input text.
ConditionVideo generates realistic dynamic videos from random noise or given scene videos.
Our method exhibits superior performance in terms of frame consistency, clip score, and conditional accuracy, outperforming other compared methods.
arXiv Detail & Related papers (2023-10-11T17:46:28Z) - VideoComposer: Compositional Video Synthesis with Motion Controllability [52.4714732331632]
VideoComposer allows users to flexibly compose a video with textual conditions, spatial conditions, and more importantly temporal conditions.
We introduce the motion vector from compressed videos as an explicit control signal to provide guidance regarding temporal dynamics.
In addition, we develop a Spatio-Temporal Condition encoder (STC-encoder) that serves as a unified interface to effectively incorporate the spatial and temporal relations of sequential inputs.
arXiv Detail & Related papers (2023-06-03T06:29:02Z) - ControlVideo: Training-free Controllable Text-to-Video Generation [117.06302461557044]
ControlVideo is a framework to enable natural and efficient text-to-video generation.
It generates both short and long videos within several minutes using one NVIDIA 2080Ti.
arXiv Detail & Related papers (2023-05-22T14:48:53Z) - Towards Smooth Video Composition [59.134911550142455]
Video generation requires consistent and persistent frames with dynamic content over time.
This work investigates modeling the temporal relations for composing video with arbitrary length, from a few frames to even infinite, using generative adversarial networks (GANs)
We show that the alias-free operation for single image generation, together with adequately pre-learned knowledge, brings a smooth frame transition without compromising the per-frame quality.
arXiv Detail & Related papers (2022-12-14T18:54:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.