VideoComposer: Compositional Video Synthesis with Motion Controllability
- URL: http://arxiv.org/abs/2306.02018v2
- Date: Tue, 6 Jun 2023 03:54:10 GMT
- Title: VideoComposer: Compositional Video Synthesis with Motion Controllability
- Authors: Xiang Wang, Hangjie Yuan, Shiwei Zhang, Dayou Chen, Jiuniu Wang,
Yingya Zhang, Yujun Shen, Deli Zhao, Jingren Zhou
- Abstract summary: VideoComposer allows users to flexibly compose a video with textual conditions, spatial conditions, and more importantly temporal conditions.
We introduce the motion vector from compressed videos as an explicit control signal to provide guidance regarding temporal dynamics.
In addition, we develop a Spatio-Temporal Condition encoder (STC-encoder) that serves as a unified interface to effectively incorporate the spatial and temporal relations of sequential inputs.
- Score: 52.4714732331632
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The pursuit of controllability as a higher standard of visual content
creation has yielded remarkable progress in customizable image synthesis.
However, achieving controllable video synthesis remains challenging due to the
large variation of temporal dynamics and the requirement of cross-frame
temporal consistency. Based on the paradigm of compositional generation, this
work presents VideoComposer that allows users to flexibly compose a video with
textual conditions, spatial conditions, and more importantly temporal
conditions. Specifically, considering the characteristic of video data, we
introduce the motion vector from compressed videos as an explicit control
signal to provide guidance regarding temporal dynamics. In addition, we develop
a Spatio-Temporal Condition encoder (STC-encoder) that serves as a unified
interface to effectively incorporate the spatial and temporal relations of
sequential inputs, with which the model could make better use of temporal
conditions and hence achieve higher inter-frame consistency. Extensive
experimental results suggest that VideoComposer is able to control the spatial
and temporal patterns simultaneously within a synthesized video in various
forms, such as text description, sketch sequence, reference video, or even
simply hand-crafted motions. The code and models will be publicly available at
https://videocomposer.github.io.
Related papers
- FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance [47.88160253507823]
We introduce FancyVideo, an innovative video generator that improves the existing text-control mechanism.
CTGM incorporates the Temporal Information (TII), Temporal Affinity Refiner (TAR), and Temporal Feature Booster (TFB) at the beginning, middle, and end of cross-attention.
arXiv Detail & Related papers (2024-08-15T14:47:44Z) - Lumiere: A Space-Time Diffusion Model for Video Generation [75.54967294846686]
We introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once.
This is in contrast to existing video models which synthesize distants followed by temporal super-resolution.
By deploying both spatial and (importantly) temporal down- and up-sampling, our model learns to directly generate a full-frame-rate, low-resolution video.
arXiv Detail & Related papers (2024-01-23T18:05:25Z) - ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation [33.37279673304]
We introduce ConditionVideo, a training-free approach to text-to-video generation based on the provided condition, video, and input text.
ConditionVideo generates realistic dynamic videos from random noise or given scene videos.
Our method exhibits superior performance in terms of frame consistency, clip score, and conditional accuracy, outperforming other compared methods.
arXiv Detail & Related papers (2023-10-11T17:46:28Z) - Edit Temporal-Consistent Videos with Image Diffusion Model [49.88186997567138]
Large-scale text-to-image (T2I) diffusion models have been extended for text-guided video editing.
T achieves state-of-the-art performance in both video temporal consistency and video editing capability.
arXiv Detail & Related papers (2023-08-17T16:40:55Z) - ControlVideo: Training-free Controllable Text-to-Video Generation [117.06302461557044]
ControlVideo is a framework to enable natural and efficient text-to-video generation.
It generates both short and long videos within several minutes using one NVIDIA 2080Ti.
arXiv Detail & Related papers (2023-05-22T14:48:53Z) - MoStGAN-V: Video Generation with Temporal Motion Styles [28.082294960744726]
Previous works attempt to generate videos in arbitrary lengths either in an autoregressive manner or regarding time as a continuous signal.
We argue that a single time-agnostic latent vector of style-based generator is insufficient to model various and temporally-consistent motions.
We introduce additional time-dependent motion styles to model diverse motion patterns.
arXiv Detail & Related papers (2023-04-05T22:47:12Z) - Towards Smooth Video Composition [59.134911550142455]
Video generation requires consistent and persistent frames with dynamic content over time.
This work investigates modeling the temporal relations for composing video with arbitrary length, from a few frames to even infinite, using generative adversarial networks (GANs)
We show that the alias-free operation for single image generation, together with adequately pre-learned knowledge, brings a smooth frame transition without compromising the per-frame quality.
arXiv Detail & Related papers (2022-12-14T18:54:13Z) - Video Demoireing with Relation-Based Temporal Consistency [68.20281109859998]
Moire patterns, appearing as color distortions, severely degrade image and video qualities when filming a screen with digital cameras.
We study how to remove such undesirable moire patterns in videos, namely video demoireing.
arXiv Detail & Related papers (2022-04-06T17:45:38Z) - Vid-ODE: Continuous-Time Video Generation with Neural Ordinary
Differential Equation [42.85126020237214]
We propose continuous-time video generation by combining neural ODE (Vid-ODE) with pixel-level video processing techniques.
Vid-ODE is the first work successfully performing continuous-time video generation using real-world videos.
arXiv Detail & Related papers (2020-10-16T06:50:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.