MotionCharacter: Identity-Preserving and Motion Controllable Human Video Generation
- URL: http://arxiv.org/abs/2411.18281v2
- Date: Sat, 30 Nov 2024 07:34:24 GMT
- Title: MotionCharacter: Identity-Preserving and Motion Controllable Human Video Generation
- Authors: Haopeng Fang, Di Qiu, Binjie Mao, Pengfei Yan, He Tang,
- Abstract summary: MotionCharacter is an efficient and high-fidelity human video generation framework.<n>We introduce an ID-preserving module to maintain identity fidelity while allowing flexible attribute modifications.<n>We also introduce ID-consistency and region-aware loss mechanisms, significantly enhancing identity consistency and detail fidelity.
- Score: 7.474418338825595
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent advancements in personalized Text-to-Video (T2V) generation highlight the importance of integrating character-specific identities and actions. However, previous T2V models struggle with identity consistency and controllable motion dynamics, mainly due to limited fine-grained facial and action-based textual prompts, and datasets that overlook key human attributes and actions. To address these challenges, we propose MotionCharacter, an efficient and high-fidelity human video generation framework designed for identity preservation and fine-grained motion control. We introduce an ID-preserving module to maintain identity fidelity while allowing flexible attribute modifications, and further integrate ID-consistency and region-aware loss mechanisms, significantly enhancing identity consistency and detail fidelity. Additionally, our approach incorporates a motion control module that prioritizes action-related text while maintaining subject consistency, along with a dataset, Human-Motion, which utilizes large language models to generate detailed motion descriptions. For simplify user control during inference, we parameterize motion intensity through a single coefficient, allowing for easy adjustments. Extensive experiments highlight the effectiveness of MotionCharacter, demonstrating significant improvements in ID-preserving, high-quality video generation.
Related papers
- DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation [63.781450025764904]
DynamiCtrl is a novel framework that explores different pose-guided structures in MM-DiT.
We propose Pose-adaptive Layer Norm (PadaLN), which utilizes adaptive layer normalization to encode sparse pose features.
By leveraging text, we not only enable fine-grained control over the generated content, but also, for the first time, achieve simultaneous control over both background and motion.
arXiv Detail & Related papers (2025-03-27T08:07:45Z) - SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers [30.06494915665044]
We present SkyReels-A1, a framework built upon video diffusion Transformer to facilitate portrait image animation.
SkyReels-A1 capitalizes on the strong generative capabilities of video DiT, enhancing facial motion transfer precision, identity retention, and temporal coherence.
It is highly applicable to domains such as virtual avatars, remote communication, and digital media generation.
arXiv Detail & Related papers (2025-02-15T16:08:40Z) - Motion by Queries: Identity-Motion Trade-offs in Text-to-Video Generation [47.61288672890036]
We investigate how self-attention query features govern motion, structure, and identity in text-to-video models.
Our analysis reveals that Q affects not only layout, but that during denoising Q also has a strong effect on subject identity.
We demonstrate two applications: (1) a zero-shot motion transfer method that is 20 times more efficient than existing approaches, and (2) a training-free technique for consistent multi-shot video generation.
arXiv Detail & Related papers (2024-12-10T18:49:39Z) - PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation [36.21554597804604]
Identity-specific human video generation with customized ID images is still under-explored.
We propose a novel framework, dubbed textbfPersonalVideo, that applies direct supervision on videos synthesized by the T2V model.
Our method's superiority in delivering high identity faithfulness while preserving the inherent video generation qualities of the original T2V model, outshining prior approaches.
arXiv Detail & Related papers (2024-11-26T02:25:38Z) - MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding [76.30210465222218]
MotionGPT-2 is a unified Large Motion-Language Model (LMLMLM)
It supports multimodal control conditions through pre-trained Large Language Models (LLMs)
It is highly adaptable to the challenging 3D holistic motion generation task.
arXiv Detail & Related papers (2024-10-29T05:25:34Z) - MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning [30.61146302275139]
We introduce a Motion-Identity Modulated Appearance Learning Module (MIA) that modulates CLIP features at both motion and identity levels.
We also design an Inter-clip Affinity Learning Module (ICA) to model temporal relationships across clips.
Our method achieves precise facial motion control (i.e., expressions and gaze), faithful identity preservation, and generates animation videos that maintain both intra/inter-clip temporal consistency.
arXiv Detail & Related papers (2024-09-23T16:33:53Z) - Zero-shot High-fidelity and Pose-controllable Character Animation [89.74818983864832]
Image-to-video (I2V) generation aims to create a video sequence from a single image.
Existing approaches suffer from inconsistency of character appearances and poor preservation of fine details.
We propose PoseAnimate, a novel zero-shot I2V framework for character animation.
arXiv Detail & Related papers (2024-04-21T14:43:31Z) - DiverseMotion: Towards Diverse Human Motion Generation via Discrete
Diffusion [70.33381660741861]
We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions.
We show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity.
arXiv Detail & Related papers (2023-09-04T05:43:48Z) - Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation.
M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse.
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z) - An Identity-Preserved Framework for Human Motion Transfer [3.6286856791379463]
Human motion transfer (HMT) aims to generate a video clip for the target subject by imitating the source subject's motion.
Previous methods have achieved good results in good-quality videos, but lose sight of individualized motion information from the source and target motions.
We propose a novel identity-preserved HMT network, termed textitIDPres.
arXiv Detail & Related papers (2022-04-14T10:27:19Z) - AMP: Adversarial Motion Priors for Stylized Physics-Based Character
Control [145.61135774698002]
We propose a fully automated approach to selecting motion for a character to track in a given scenario.
High-level task objectives that the character should perform can be specified by relatively simple reward functions.
Low-level style of the character's behaviors can be specified by a dataset of unstructured motion clips.
Our system produces high-quality motions comparable to those achieved by state-of-the-art tracking-based techniques.
arXiv Detail & Related papers (2021-04-05T22:43:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.