Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion
- URL: http://arxiv.org/abs/2402.03162v2
- Date: Mon, 6 May 2024 05:37:20 GMT
- Title: Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion
- Authors: Shiyuan Yang, Liang Hou, Haibin Huang, Chongyang Ma, Pengfei Wan, Di Zhang, Xiaodong Chen, Jing Liao,
- Abstract summary: We introduce Direct-a-Video, a system that allows users to independently specify motions for multiple objects as well as camera's pan and zoom movements.
For camera movement, we introduce new temporal cross-attention layers to interpret quantitative camera movement parameters.
Both components operate independently, allowing individual or combined control, and can generalize to open-domain scenarios.
- Score: 34.404342332033636
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent text-to-video diffusion models have achieved impressive progress. In practice, users often desire the ability to control object motion and camera movement independently for customized video creation. However, current methods lack the focus on separately controlling object motion and camera movement in a decoupled manner, which limits the controllability and flexibility of text-to-video models. In this paper, we introduce Direct-a-Video, a system that allows users to independently specify motions for multiple objects as well as camera's pan and zoom movements, as if directing a video. We propose a simple yet effective strategy for the decoupled control of object motion and camera movement. Object motion is controlled through spatial cross-attention modulation using the model's inherent priors, requiring no additional optimization. For camera movement, we introduce new temporal cross-attention layers to interpret quantitative camera movement parameters. We further employ an augmentation-based approach to train these layers in a self-supervised manner on a small-scale dataset, eliminating the need for explicit motion annotation. Both components operate independently, allowing individual or combined control, and can generalize to open-domain scenarios. Extensive experiments demonstrate the superiority and effectiveness of our method. Project page and code are available at https://direct-a-video.github.io/.
Related papers
- VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control [74.5434726968562]
We tame transformers video for 3D camera control using a ControlNet-like conditioning mechanism based on Plucker coordinates.
Our work is the first to enable camera control for transformer-based video diffusion models.
arXiv Detail & Related papers (2024-07-17T17:59:05Z) - MotionBooth: Motion-Aware Customized Text-to-Video Generation [44.41894050494623]
MotionBooth is a framework designed for animating customized subjects with precise control over both object and camera movements.
We efficiently fine-tune a text-to-video model to capture the object's shape and attributes accurately.
Our approach presents subject region loss and video preservation loss to enhance the subject's learning performance.
arXiv Detail & Related papers (2024-06-25T17:42:25Z) - Image Conductor: Precision Control for Interactive Video Synthesis [90.2353794019393]
Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements.
Image Conductor is a method for precise control of camera transitions and object movements to generate video assets from a single image.
arXiv Detail & Related papers (2024-06-21T17:55:05Z) - MotionClone: Training-Free Motion Cloning for Controllable Video Generation [41.621147782128396]
MotionClone is a training-free framework that enables motion cloning from reference videos to versatile motion-controlled video generation.
MotionClone exhibits proficiency in both global camera motion and local object motion, with notable superiority in terms of motion fidelity, textual alignment, and temporal consistency.
arXiv Detail & Related papers (2024-06-08T03:44:25Z) - MotionMaster: Training-free Camera Motion Transfer For Video Generation [48.706578330771386]
We propose a novel training-free video motion transfer model, which disentangles camera motions and object motions in source videos.
Our model can effectively decouple camera-object motion and apply the decoupled camera motion to a wide range of controllable video generation tasks.
arXiv Detail & Related papers (2024-04-24T10:28:54Z) - Follow-Your-Click: Open-domain Regional Image Animation via Short
Prompts [67.5094490054134]
We propose a practical framework, named Follow-Your-Click, to achieve image animation with a simple user click.
Our framework has simpler yet precise user control and better generation performance than previous methods.
arXiv Detail & Related papers (2024-03-13T05:44:37Z) - MotionCrafter: One-Shot Motion Customization of Diffusion Models [66.44642854791807]
We introduce MotionCrafter, a one-shot instance-guided motion customization method.
MotionCrafter employs a parallel spatial-temporal architecture that injects the reference motion into the temporal component of the base model.
During training, a frozen base model provides appearance normalization, effectively separating appearance from motion.
arXiv Detail & Related papers (2023-12-08T16:31:04Z) - MotionCtrl: A Unified and Flexible Motion Controller for Video Generation [77.09621778348733]
Motions in a video primarily consist of camera motion, induced by camera movement, and object motion, resulting from object movement.
This paper presents MotionCtrl, a unified motion controller for video generation designed to effectively and independently control camera and object motion.
arXiv Detail & Related papers (2023-12-06T17:49:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.