Related papers: MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

URL: http://arxiv.org/abs/2312.03641v2
Date: Tue, 16 Jul 2024 17:27:10 GMT
Title: MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
Authors: Zhouxia Wang, Ziyang Yuan, Xintao Wang, Tianshui Chen, Menghan Xia, Ping Luo, Ying Shan,
Abstract summary: Motions in a video primarily consist of camera motion, induced by camera movement, and object motion, resulting from object movement. This paper presents MotionCtrl, a unified motion controller for video generation designed to effectively and independently control camera and object motion.
Score: 77.09621778348733
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motions in a video primarily consist of camera motion, induced by camera movement, and object motion, resulting from object movement. Accurate control of both camera and object motion is essential for video generation. However, existing works either mainly focus on one type of motion or do not clearly distinguish between the two, limiting their control capabilities and diversity. Therefore, this paper presents MotionCtrl, a unified and flexible motion controller for video generation designed to effectively and independently control camera and object motion. The architecture and training strategy of MotionCtrl are carefully devised, taking into account the inherent properties of camera motion, object motion, and imperfect training data. Compared to previous methods, MotionCtrl offers three main advantages: 1) It effectively and independently controls camera motion and object motion, enabling more fine-grained motion control and facilitating flexible and diverse combinations of both types of motion. 2) Its motion conditions are determined by camera poses and trajectories, which are appearance-free and minimally impact the appearance or shape of objects in generated videos. 3) It is a relatively generalizable model that can adapt to a wide array of camera poses and trajectories once trained. Extensive qualitative and quantitative experiments have been conducted to demonstrate the superiority of MotionCtrl over existing methods. Project Page: https://wzhouxiff.github.io/projects/MotionCtrl/

Related papers

Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation [85.10745006495364]
We present Uni3C, a unified framework for precise control of both camera and human motion in video generation. First, we propose a plug-and-play control module trained with a frozen video generative backbone, PCDController. Second, we propose a jointly aligned 3D world guidance for the inference phase that seamlessly integrates both scenic point clouds and SMPL-X characters.
arXiv Detail & Related papers (2025-04-21T07:10:41Z)
ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer [44.33224798292861]
ConMo is a framework that disentangles and recomposes the motions of subjects and camera movements. It enables more accurate motion control across diverse subjects and improves performance in multi-subject scenarios. ConMo unlocks a wide range of applications, including subject size and position editing, subject removal, semantic modifications, and camera motion simulation.
arXiv Detail & Related papers (2025-04-03T10:15:52Z)
Perception-as-Control: Fine-grained Controllable Image Animation with 3D-aware Motion Representation [21.87745390965703]
We introduce 3D-aware motion representation and propose an image animation framework, called Perception-as-Control, to achieve fine-grained collaborative motion control. Specifically, we construct 3D-aware motion representation from a reference image, manipulate it based on interpreted user instructions, and perceive it from different viewpoints. Our framework leverages the perception results as motion control signals, enabling it to support various motion-related video synthesis tasks in a unified and flexible way.
arXiv Detail & Related papers (2025-01-09T07:23:48Z)
Free-Form Motion Control: A Synthetic Video Generation Dataset with Controllable Camera and Object Motions [78.65431951506152]
We introduce a Synthetic dataset for Free-Form Motion Control (SynFMC) The proposed SynFMC dataset includes diverse objects and environments and covers various motion patterns according to specific rules. We further propose a method, Free-Form Motion Control (FMC), which enables independent or simultaneous control of object and camera movements.
arXiv Detail & Related papers (2025-01-02T18:59:45Z)
ObjCtrl-2.5D: Training-free Object Control with Camera Poses [61.23620424598908]
This study aims to achieve more precise and versatile object control in image-to-video (I2V) generation. We present Ctrl-2.5D, a training-free object control approach that uses a 3D trajectory, extended from a 2D trajectory with depth information, as a control signal. Experiments demonstrate that Ctrl-2.5D significantly improves object control accuracy compared to training-free methods.
arXiv Detail & Related papers (2024-12-10T18:14:30Z)
MotionBooth: Motion-Aware Customized Text-to-Video Generation [44.41894050494623]
MotionBooth is a framework designed for animating customized subjects with precise control over both object and camera movements. We efficiently fine-tune a text-to-video model to capture the object's shape and attributes accurately. Our approach presents subject region loss and video preservation loss to enhance the subject's learning performance.
arXiv Detail & Related papers (2024-06-25T17:42:25Z)
Image Conductor: Precision Control for Interactive Video Synthesis [90.2353794019393]
Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements. Image Conductor is a method for precise control of camera transitions and object movements to generate video assets from a single image.
arXiv Detail & Related papers (2024-06-21T17:55:05Z)
MotionClone: Training-Free Motion Cloning for Controllable Video Generation [41.621147782128396]
MotionClone is a training-free framework that enables motion cloning from reference videos to versatile motion-controlled video generation. MotionClone exhibits proficiency in both global camera motion and local object motion, with notable superiority in terms of motion fidelity, textual alignment, and temporal consistency.
arXiv Detail & Related papers (2024-06-08T03:44:25Z)
MotionMaster: Training-free Camera Motion Transfer For Video Generation [48.706578330771386]
We propose a novel training-free video motion transfer model, which disentangles camera motions and object motions in source videos. Our model can effectively decouple camera-object motion and apply the decoupled camera motion to a wide range of controllable video generation tasks.
arXiv Detail & Related papers (2024-04-24T10:28:54Z)
Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts [67.5094490054134]
We propose a practical framework, named Follow-Your-Click, to achieve image animation with a simple user click. Our framework has simpler yet precise user control and better generation performance than previous methods.
arXiv Detail & Related papers (2024-03-13T05:44:37Z)
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion [34.404342332033636]
We introduce Direct-a-Video, a system that allows users to independently specify motions for multiple objects as well as camera's pan and zoom movements. For camera movement, we introduce new temporal cross-attention layers to interpret quantitative camera movement parameters. Both components operate independently, allowing individual or combined control, and can generalize to open-domain scenarios.
arXiv Detail & Related papers (2024-02-05T16:30:57Z)
MotionCrafter: One-Shot Motion Customization of Diffusion Models [66.44642854791807]
We introduce MotionCrafter, a one-shot instance-guided motion customization method. MotionCrafter employs a parallel spatial-temporal architecture that injects the reference motion into the temporal component of the base model. During training, a frozen base model provides appearance normalization, effectively separating appearance from motion.
arXiv Detail & Related papers (2023-12-08T16:31:04Z)
MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation [131.1446077627191]
Zero-shot Text-to-Video synthesis generates videos based on prompts without any videos. We propose a prompt-adaptive and disentangled motion control strategy coined as MotionZero. Our strategy could correctly control motion of different objects and support versatile applications including zero-shot video edit.
arXiv Detail & Related papers (2023-11-28T09:38:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.