LaMD: Latent Motion Diffusion for Video Generation
- URL: http://arxiv.org/abs/2304.11603v1
- Date: Sun, 23 Apr 2023 10:32:32 GMT
- Title: LaMD: Latent Motion Diffusion for Video Generation
- Authors: Yaosi Hu, Zhenzhong Chen, Chong Luo
- Abstract summary: latent motion diffusion (LaMD) framework consists of a motion-decomposed video autoencoder and a diffusion-based motion generator.
Results show that LaMD generates high-quality videos with a wide range of motions, from dynamics to highly controllable movements.
- Score: 69.4111397077229
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating coherent and natural movement is the key challenge in video
generation. This research proposes to condense video generation into a problem
of motion generation, to improve the expressiveness of motion and make video
generation more manageable. This can be achieved by breaking down the video
generation process into latent motion generation and video reconstruction. We
present a latent motion diffusion (LaMD) framework, which consists of a
motion-decomposed video autoencoder and a diffusion-based motion generator, to
implement this idea. Through careful design, the motion-decomposed video
autoencoder can compress patterns in movement into a concise latent motion
representation. Meanwhile, the diffusion-based motion generator is able to
efficiently generate realistic motion on a continuous latent space under
multi-modal conditions, at a cost that is similar to that of image diffusion
models. Results show that LaMD generates high-quality videos with a wide range
of motions, from stochastic dynamics to highly controllable movements. It
achieves new state-of-the-art performance on benchmark datasets, including
BAIR, Landscape and CATER-GENs, for Image-to-Video (I2V) and
Text-Image-to-Video (TI2V) generation. The source code of LaMD will be made
available soon.
Related papers
- ViMo: Generating Motions from Casual Videos [34.19904765033005]
We propose a novel Video-to-Motion-Generation framework (ViMo)
ViMo could leverage the immense trove of untapped video content to produce abundant and diverse 3D human motions.
Striking experimental results demonstrate the proposed model could generate natural motions even for videos where rapid movements, varying perspectives, or frequent occlusions might exist.
arXiv Detail & Related papers (2024-08-13T03:57:35Z) - MotionCraft: Physics-based Zero-Shot Video Generation [22.33113030344355]
MotionCraft is a new zero-shot video generator to craft physics-based and realistic videos.
We show that MotionCraft is able to warp the noise latent space of an image diffusion model, such as Stable Diffusion, by applying an optical flow.
We compare our method with the state-of-the-art Text2Video-Zero reporting qualitative and quantitative improvements.
arXiv Detail & Related papers (2024-05-22T11:44:57Z) - Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [54.32923808964701]
Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms.
SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics.
Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
arXiv Detail & Related papers (2024-03-22T14:47:18Z) - MotionCrafter: One-Shot Motion Customization of Diffusion Models [66.44642854791807]
We introduce MotionCrafter, a one-shot instance-guided motion customization method.
MotionCrafter employs a parallel spatial-temporal architecture that injects the reference motion into the temporal component of the base model.
During training, a frozen base model provides appearance normalization, effectively separating appearance from motion.
arXiv Detail & Related papers (2023-12-08T16:31:04Z) - VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models.
Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference.
We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z) - MoVideo: Motion-Aware Video Generation with Diffusion Models [97.03352319694795]
We propose a novel motion-aware generation (MoVideo) framework that takes motion into consideration from two aspects: video depth and optical flow.
MoVideo achieves state-of-the-art results in both text-to-video and image-to-video generation, showing promising prompt consistency, frame consistency and visual quality.
arXiv Detail & Related papers (2023-11-19T13:36:03Z) - A Good Image Generator Is What You Need for High-Resolution Video
Synthesis [73.82857768949651]
We present a framework that leverages contemporary image generators to render high-resolution videos.
We frame the video synthesis problem as discovering a trajectory in the latent space of a pre-trained and fixed image generator.
We introduce a motion generator that discovers the desired trajectory, in which content and motion are disentangled.
arXiv Detail & Related papers (2021-04-30T15:38:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.