VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models
- URL: http://arxiv.org/abs/2503.21781v1
- Date: Thu, 27 Mar 2025 17:59:58 GMT
- Title: VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models
- Authors: Chi-Pin Huang, Yen-Siang Wu, Hung-Kai Chung, Kai-Po Chang, Fu-En Yang, Yu-Chiang Frank Wang,
- Abstract summary: VideoMage employs subject and motion LoRAs to capture personalized content from user-provided images and videos.<n>We develop a spatial-temporal composition scheme to guide interactions among subjects within the desired motion patterns.
- Score: 24.004996738924902
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Customized text-to-video generation aims to produce high-quality videos that incorporate user-specified subject identities or motion patterns. However, existing methods mainly focus on personalizing a single concept, either subject identity or motion pattern, limiting their effectiveness for multiple subjects with the desired motion patterns. To tackle this challenge, we propose a unified framework VideoMage for video customization over both multiple subjects and their interactive motions. VideoMage employs subject and motion LoRAs to capture personalized content from user-provided images and videos, along with an appearance-agnostic motion learning approach to disentangle motion patterns from visual appearance. Furthermore, we develop a spatial-temporal composition scheme to guide interactions among subjects within the desired motion patterns. Extensive experiments demonstrate that VideoMage outperforms existing methods, generating coherent, user-controlled videos with consistent subject identities and interactions.
Related papers
- MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation [65.74312406211213]
This paper presents a method that allows users to design cinematic video shots in the context of image-to-video generation.<n>By connecting insights from classical computer graphics and contemporary video generation techniques, we demonstrate the ability to achieve 3D-aware motion control in I2V synthesis.
arXiv Detail & Related papers (2025-02-06T18:41:04Z) - MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models [59.10171699717122]
MoTrans is a customized motion transfer method enabling video generation of similar motion in new context.<n> multimodal representations from recaptioned prompt and video frames promote the modeling of appearance.<n>Our method effectively learns specific motion pattern from singular or multiple reference videos.
arXiv Detail & Related papers (2024-12-02T10:07:59Z) - CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects [61.323597069037056]
Current approaches for personalizing text-to-video generation suffer from tackling multiple subjects.
We propose CustomVideo, a novel framework that can generate identity-preserving videos with the guidance of multiple subjects.
arXiv Detail & Related papers (2024-01-18T13:23:51Z) - NewMove: Customizing text-to-video models with novel motions [74.9442859239997]
We introduce an approach for augmenting text-to-video generation models with customized motions.<n>By leveraging a few video samples demonstrating specific movements as input, our method learns and generalizes the input motion patterns for diverse, text-specified scenarios.
arXiv Detail & Related papers (2023-12-07T18:59:03Z) - DreamVideo: Composing Your Dream Videos with Customized Subject and
Motion [52.7394517692186]
We present DreamVideo, a novel approach to generating personalized videos from a few static images of the desired subject.
DreamVideo decouples this task into two stages, subject learning and motion learning, by leveraging a pre-trained video diffusion model.
In motion learning, we architect a motion adapter and fine-tune it on the given videos to effectively model the target motion pattern.
arXiv Detail & Related papers (2023-12-07T16:57:26Z) - VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models.
Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference.
We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z) - VideoDreamer: Customized Multi-Subject Text-to-Video Generation with Disen-Mix Finetuning on Language-Video Foundation Models [43.46536102838717]
VideoDreamer is a novel framework for customized multi-subject text-to-video generation.
It can generate temporally consistent text-guided videos that faithfully preserve the visual features of the given multiple subjects.
arXiv Detail & Related papers (2023-11-02T04:38:50Z) - MotionDirector: Motion Customization of Text-to-Video Diffusion Models [24.282240656366714]
Motion Customization aims to adapt existing text-to-video diffusion models to generate videos with customized motion.
We propose MotionDirector, with a dual-path LoRAs architecture to decouple the learning of appearance and motion.
Our method also supports various downstream applications, such as the mixing of different videos with their appearance and motion respectively, and animating a single image with customized motions.
arXiv Detail & Related papers (2023-10-12T16:26:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.