MoStGAN-V: Video Generation with Temporal Motion Styles
- URL: http://arxiv.org/abs/2304.02777v1
- Date: Wed, 5 Apr 2023 22:47:12 GMT
- Title: MoStGAN-V: Video Generation with Temporal Motion Styles
- Authors: Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny
- Abstract summary: Previous works attempt to generate videos in arbitrary lengths either in an autoregressive manner or regarding time as a continuous signal.
We argue that a single time-agnostic latent vector of style-based generator is insufficient to model various and temporally-consistent motions.
We introduce additional time-dependent motion styles to model diverse motion patterns.
- Score: 28.082294960744726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video generation remains a challenging task due to spatiotemporal complexity
and the requirement of synthesizing diverse motions with temporal consistency.
Previous works attempt to generate videos in arbitrary lengths either in an
autoregressive manner or regarding time as a continuous signal. However, they
struggle to synthesize detailed and diverse motions with temporal coherence and
tend to generate repetitive scenes after a few time steps. In this work, we
argue that a single time-agnostic latent vector of style-based generator is
insufficient to model various and temporally-consistent motions. Hence, we
introduce additional time-dependent motion styles to model diverse motion
patterns. In addition, a Motion Style Attention modulation mechanism, dubbed as
MoStAtt, is proposed to augment frames with vivid dynamics for each specific
scale (i.e., layer), which assigns attention score for each motion style w.r.t
deconvolution filter weights in the target synthesis layer and softly attends
different motion styles for weight modulation. Experimental results show our
model achieves state-of-the-art performance on four unconditional $256^2$ video
synthesis benchmarks trained with only 3 frames per clip and produces better
qualitative results with respect to dynamic motions. Code and videos have been
made available at https://github.com/xiaoqian-shen/MoStGAN-V.
Related papers
- Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [54.32923808964701]
Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms.
SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics.
Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
arXiv Detail & Related papers (2024-03-22T14:47:18Z) - MotionCrafter: One-Shot Motion Customization of Diffusion Models [66.44642854791807]
We introduce MotionCrafter, a one-shot instance-guided motion customization method.
MotionCrafter employs a parallel spatial-temporal architecture that injects the reference motion into the temporal component of the base model.
During training, a frozen base model provides appearance normalization, effectively separating appearance from motion.
arXiv Detail & Related papers (2023-12-08T16:31:04Z) - VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models.
Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference.
We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z) - VideoComposer: Compositional Video Synthesis with Motion Controllability [52.4714732331632]
VideoComposer allows users to flexibly compose a video with textual conditions, spatial conditions, and more importantly temporal conditions.
We introduce the motion vector from compressed videos as an explicit control signal to provide guidance regarding temporal dynamics.
In addition, we develop a Spatio-Temporal Condition encoder (STC-encoder) that serves as a unified interface to effectively incorporate the spatial and temporal relations of sequential inputs.
arXiv Detail & Related papers (2023-06-03T06:29:02Z) - Continuous-Time Video Generation via Learning Motion Dynamics with
Neural ODE [26.13198266911874]
We propose a novel video generation approach that learns separate distributions for motion and appearance.
We employ a two-stage approach where the first stage converts a noise vector to a sequence of keypoints in arbitrary frame rates, and the second stage synthesizes videos based on the given keypoints sequence and the appearance noise vector.
arXiv Detail & Related papers (2021-12-21T03:30:38Z) - Dance In the Wild: Monocular Human Animation with Neural Dynamic
Appearance Synthesis [56.550999933048075]
We propose a video based synthesis method that tackles challenges and demonstrates high quality results for in-the-wild videos.
We introduce a novel motion signature that is used to modulate the generator weights to capture dynamic appearance changes.
We evaluate our method on a set of challenging videos and show that our approach achieves state-of-the art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2021-11-10T20:18:57Z) - Dynamic View Synthesis from Dynamic Monocular Video [69.80425724448344]
We present an algorithm for generating views at arbitrary viewpoints and any input time step given a monocular video of a dynamic scene.
We show extensive quantitative and qualitative results of dynamic view synthesis from casually captured videos.
arXiv Detail & Related papers (2021-05-13T17:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.