Related papers: MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

URL: http://arxiv.org/abs/2405.20222v3
Date: Thu, 11 Jul 2024 16:26:03 GMT
Title: MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
Authors: Muyao Niu, Xiaodong Cun, Xintao Wang, Yong Zhang, Ying Shan, Yinqiang Zheng,
Abstract summary: MOFA-Video is an advanced controllable image animation method that generates video from the given image using various additional controllable signals. We design several domain-aware motion field adapters to control the generated motions in the video generation pipeline. After training, the MOFA-Adapters in different domains can also work together for more controllable video generation.
Score: 78.11258752076046
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present MOFA-Video, an advanced controllable image animation method that generates video from the given image using various additional controllable signals (such as human landmarks reference, manual trajectories, and another even provided video) or their combinations. This is different from previous methods which only can work on a specific motion domain or show weak control abilities with diffusion prior. To achieve our goal, we design several domain-aware motion field adapters (\ie, MOFA-Adapters) to control the generated motions in the video generation pipeline. For MOFA-Adapters, we consider the temporal motion consistency of the video and generate the dense motion flow from the given sparse control conditions first, and then, the multi-scale features of the given image are wrapped as a guided feature for stable video diffusion generation. We naively train two motion adapters for the manual trajectories and the human landmarks individually since they both contain sparse information about the control. After training, the MOFA-Adapters in different domains can also work together for more controllable video generation. Project Page: https://myniuuu.github.io/MOFA_Video/

Related papers

Moaw: Unleashing Motion Awareness for Video Diffusion Models [71.34328578845721]
Moaw is a framework that unleashes motion awareness for video diffusion models.<n>We train a diffusion model for motion perception, shifting its modality from image-to-video generation to video-to-dense-tracking.<n>We then construct a motion-labeled dataset to identify features that encode the strongest motion information, and inject them into a structurally identical video generation model.
arXiv Detail & Related papers (2026-01-19T06:45:46Z)
ATI: Any Trajectory Instruction for Controllable Video Generation [25.249489701215467]
We propose a unified framework for motion control in video generation that seamlessly integrates camera movement, object-level translation, and fine-grained local motion.<n>Our approach offers a cohesive solution by projecting user-defined trajectories into the latent space of pre-trained image-to-video generation models.
arXiv Detail & Related papers (2025-05-28T23:49:18Z)
LMP: Leveraging Motion Prior in Zero-Shot Video Generation with Diffusion Transformer [10.44905923812975]
We propose the Leveraging Motion Prior (LMP) framework for zero-shot video generation.<n>Our framework harnesses the powerful generative capabilities of pre-trained diffusion transformers to enable motion in the generated videos to reference user-provided motion videos.<n>Our approach achieves state-of-the-art performance in generation quality, prompt-video consistency, and control capability.
arXiv Detail & Related papers (2025-05-20T10:18:29Z)
ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer [44.33224798292861]
ConMo is a framework that disentangles and recomposes the motions of subjects and camera movements. It enables more accurate motion control across diverse subjects and improves performance in multi-subject scenarios. ConMo unlocks a wide range of applications, including subject size and position editing, subject removal, semantic modifications, and camera motion simulation.
arXiv Detail & Related papers (2025-04-03T10:15:52Z)
Motion Prompting: Controlling Video Generation with Motion Trajectories [57.049252242807874]
We train a video generation model conditioned on sparse or dense video trajectories. We translate high-level user requests into detailed, semi-dense motion prompts. We demonstrate our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing.
arXiv Detail & Related papers (2024-12-03T18:59:56Z)
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models [59.10171699717122]
MoTrans is a customized motion transfer method enabling video generation of similar motion in new context. multimodal representations from recaptioned prompt and video frames promote the modeling of appearance. Our method effectively learns specific motion pattern from singular or multiple reference videos.
arXiv Detail & Related papers (2024-12-02T10:07:59Z)
AnimateAnything: Consistent and Controllable Animation for Video Generation [24.576022028967195]
We present a unified controllable video generation approach AnimateAnything. It facilitates precise and consistent video manipulation across various conditions. Experiments demonstrate that our method outperforms the state-of-the-art approaches.
arXiv Detail & Related papers (2024-11-16T16:36:49Z)
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control [42.506988751934685]
We present DreamVideo-2, a zero-shot video customization framework capable of generating videos with a specific subject and motion trajectory. Specifically, we introduce reference attention, which leverages the model's inherent capabilities for subject learning. We devise a mask-guided motion module to achieve precise motion control by fully utilizing the robust motion signal of box masks.
arXiv Detail & Related papers (2024-10-17T17:52:57Z)
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion [34.404342332033636]
We introduce Direct-a-Video, a system that allows users to independently specify motions for multiple objects as well as camera's pan and zoom movements. For camera movement, we introduce new temporal cross-attention layers to interpret quantitative camera movement parameters. Both components operate independently, allowing individual or combined control, and can generalize to open-domain scenarios.
arXiv Detail & Related papers (2024-02-05T16:30:57Z)
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling [62.19142543520805]
Motion-I2V is a framework for consistent and controllable image-to-video generation. It factorizes I2V into two stages with explicit motion modeling. Motion-I2V's second stage naturally supports zero-shot video-to-video translation.
arXiv Detail & Related papers (2024-01-29T09:06:43Z)
Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions [94.03133100056372]
Moonshot is a new video generation model that conditions simultaneously on multimodal inputs of image and text. Model can be easily repurposed for a variety of generative applications, such as personalized video generation, image animation and video editing.
arXiv Detail & Related papers (2024-01-03T16:43:47Z)
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models. Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference. We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z)
MoVideo: Motion-Aware Video Generation with Diffusion Models [97.03352319694795]
We propose a novel motion-aware generation (MoVideo) framework that takes motion into consideration from two aspects: video depth and optical flow. MoVideo achieves state-of-the-art results in both text-to-video and image-to-video generation, showing promising prompt consistency, frame consistency and visual quality.
arXiv Detail & Related papers (2023-11-19T13:36:03Z)
MotionDirector: Motion Customization of Text-to-Video Diffusion Models [24.282240656366714]
Motion Customization aims to adapt existing text-to-video diffusion models to generate videos with customized motion. We propose MotionDirector, with a dual-path LoRAs architecture to decouple the learning of appearance and motion. Our method also supports various downstream applications, such as the mixing of different videos with their appearance and motion respectively, and animating a single image with customized motions.
arXiv Detail & Related papers (2023-10-12T16:26:18Z)
LaMD: Latent Motion Diffusion for Video Generation [69.4111397077229]
latent motion diffusion (LaMD) framework consists of a motion-decomposed video autoencoder and a diffusion-based motion generator. Results show that LaMD generates high-quality videos with a wide range of motions, from dynamics to highly controllable movements.
arXiv Detail & Related papers (2023-04-23T10:32:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.