Image Conductor: Precision Control for Interactive Video Synthesis
- URL: http://arxiv.org/abs/2406.15339v1
- Date: Fri, 21 Jun 2024 17:55:05 GMT
- Title: Image Conductor: Precision Control for Interactive Video Synthesis
- Authors: Yaowei Li, Xintao Wang, Zhaoyang Zhang, Zhouxia Wang, Ziyang Yuan, Liangbin Xie, Yuexian Zou, Ying Shan,
- Abstract summary: Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements.
Image Conductor is a method for precise control of camera transitions and object movements to generate video assets from a single image.
- Score: 90.2353794019393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements, typically involving labor-intensive real-world capturing. Despite advancements in generative AI for video creation, achieving precise control over motion for interactive video asset generation remains challenging. To this end, we propose Image Conductor, a method for precise control of camera transitions and object movements to generate video assets from a single image. An well-cultivated training strategy is proposed to separate distinct camera and object motion by camera LoRA weights and object LoRA weights. To further address cinematographic variations from ill-posed trajectories, we introduce a camera-free guidance technique during inference, enhancing object movements while eliminating camera transitions. Additionally, we develop a trajectory-oriented video motion data curation pipeline for training. Quantitative and qualitative experiments demonstrate our method's precision and fine-grained control in generating motion-controllable videos from images, advancing the practical application of interactive video synthesis. Project webpage available at https://liyaowei-stu.github.io/project/ImageConductor/
Related papers
- MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation [65.74312406211213]
This paper presents a method that allows users to design cinematic video shots in the context of image-to-video generation.
By connecting insights from classical computer graphics and contemporary video generation techniques, we demonstrate the ability to achieve 3D-aware motion control in I2V synthesis.
arXiv Detail & Related papers (2025-02-06T18:41:04Z) - MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent [58.09607975296408]
We propose MotionAgent, enabling fine-grained motion control for text-guided image-to-video generation.
The key technique is the motion field agent that converts motion information in text prompts into explicit motion fields.
We construct a subset of VBench to evaluate the alignment of motion information in the text and the generated video, outperforming other advanced models on motion generation accuracy.
arXiv Detail & Related papers (2025-02-05T14:26:07Z) - VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control [66.66226299852559]
VideoAnydoor is a zero-shot video object insertion framework with high-fidelity detail preservation and precise motion control.
To preserve the detailed appearance and meanwhile support fine-grained motion control, we design a pixel warper.
arXiv Detail & Related papers (2025-01-02T18:59:54Z) - Motion Prompting: Controlling Video Generation with Motion Trajectories [57.049252242807874]
We train a video generation model conditioned on sparse or dense video trajectories.
We translate high-level user requests into detailed, semi-dense motion prompts.
We demonstrate our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing.
arXiv Detail & Related papers (2024-12-03T18:59:56Z) - MotionBooth: Motion-Aware Customized Text-to-Video Generation [44.41894050494623]
MotionBooth is a framework designed for animating customized subjects with precise control over both object and camera movements.
We efficiently fine-tune a text-to-video model to capture the object's shape and attributes accurately.
Our approach presents subject region loss and video preservation loss to enhance the subject's learning performance.
arXiv Detail & Related papers (2024-06-25T17:42:25Z) - MotionMaster: Training-free Camera Motion Transfer For Video Generation [48.706578330771386]
We propose a novel training-free video motion transfer model, which disentangles camera motions and object motions in source videos.
Our model can effectively decouple camera-object motion and apply the decoupled camera motion to a wide range of controllable video generation tasks.
arXiv Detail & Related papers (2024-04-24T10:28:54Z) - Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion [34.404342332033636]
We introduce Direct-a-Video, a system that allows users to independently specify motions for multiple objects as well as camera's pan and zoom movements.
For camera movement, we introduce new temporal cross-attention layers to interpret quantitative camera movement parameters.
Both components operate independently, allowing individual or combined control, and can generalize to open-domain scenarios.
arXiv Detail & Related papers (2024-02-05T16:30:57Z) - Learning Variational Motion Prior for Video-based Motion Capture [31.79649766268877]
We present a novel variational motion prior (VMP) learning approach for video-based motion capture.
Our framework can effectively reduce temporal jittering and failure modes in frame-wise pose estimation.
Experiments over both public datasets and in-the-wild videos have demonstrated the efficacy and generalization capability of our framework.
arXiv Detail & Related papers (2022-10-27T02:45:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.