LaMD: Latent Motion Diffusion for Video Generation
- URL: http://arxiv.org/abs/2304.11603v1
- Date: Sun, 23 Apr 2023 10:32:32 GMT
- Title: LaMD: Latent Motion Diffusion for Video Generation
- Authors: Yaosi Hu, Zhenzhong Chen, Chong Luo
- Abstract summary: latent motion diffusion (LaMD) framework consists of a motion-decomposed video autoencoder and a diffusion-based motion generator.
Results show that LaMD generates high-quality videos with a wide range of motions, from dynamics to highly controllable movements.
- Score: 69.4111397077229
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating coherent and natural movement is the key challenge in video
generation. This research proposes to condense video generation into a problem
of motion generation, to improve the expressiveness of motion and make video
generation more manageable. This can be achieved by breaking down the video
generation process into latent motion generation and video reconstruction. We
present a latent motion diffusion (LaMD) framework, which consists of a
motion-decomposed video autoencoder and a diffusion-based motion generator, to
implement this idea. Through careful design, the motion-decomposed video
autoencoder can compress patterns in movement into a concise latent motion
representation. Meanwhile, the diffusion-based motion generator is able to
efficiently generate realistic motion on a continuous latent space under
multi-modal conditions, at a cost that is similar to that of image diffusion
models. Results show that LaMD generates high-quality videos with a wide range
of motions, from stochastic dynamics to highly controllable movements. It
achieves new state-of-the-art performance on benchmark datasets, including
BAIR, Landscape and CATER-GENs, for Image-to-Video (I2V) and
Text-Image-to-Video (TI2V) generation. The source code of LaMD will be made
available soon.
Related papers
- VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models [71.9811050853964]
VideoJAM is a novel framework that instills an effective motion prior to video generators.
VideoJAM achieves state-of-the-art performance in motion coherence.
These findings emphasize that appearance and motion can be complementary and, when effectively integrated, enhance both the visual quality and the coherence of video generation.
arXiv Detail & Related papers (2025-02-04T17:07:10Z) - Motion Prompting: Controlling Video Generation with Motion Trajectories [57.049252242807874]
We train a video generation model conditioned on sparse or dense video trajectories.
We translate high-level user requests into detailed, semi-dense motion prompts.
We demonstrate our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing.
arXiv Detail & Related papers (2024-12-03T18:59:56Z) - ViMo: Generating Motions from Casual Videos [34.19904765033005]
We propose a novel Video-to-Motion-Generation framework (ViMo)
ViMo could leverage the immense trove of untapped video content to produce abundant and diverse 3D human motions.
Striking experimental results demonstrate the proposed model could generate natural motions even for videos where rapid movements, varying perspectives, or frequent occlusions might exist.
arXiv Detail & Related papers (2024-08-13T03:57:35Z) - MotionCraft: Physics-based Zero-Shot Video Generation [22.33113030344355]
MotionCraft is a new zero-shot video generator to craft physics-based and realistic videos.
We show that MotionCraft is able to warp the noise latent space of an image diffusion model, such as Stable Diffusion, by applying an optical flow.
We compare our method with the state-of-the-art Text2Video-Zero reporting qualitative and quantitative improvements.
arXiv Detail & Related papers (2024-05-22T11:44:57Z) - VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models.
Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference.
We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z) - MoVideo: Motion-Aware Video Generation with Diffusion Models [97.03352319694795]
We propose a novel motion-aware generation (MoVideo) framework that takes motion into consideration from two aspects: video depth and optical flow.
MoVideo achieves state-of-the-art results in both text-to-video and image-to-video generation, showing promising prompt consistency, frame consistency and visual quality.
arXiv Detail & Related papers (2023-11-19T13:36:03Z) - A Good Image Generator Is What You Need for High-Resolution Video
Synthesis [73.82857768949651]
We present a framework that leverages contemporary image generators to render high-resolution videos.
We frame the video synthesis problem as discovering a trajectory in the latent space of a pre-trained and fixed image generator.
We introduce a motion generator that discovers the desired trajectory, in which content and motion are disentangled.
arXiv Detail & Related papers (2021-04-30T15:38:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.