An Identity-Preserved Framework for Human Motion Transfer
- URL: http://arxiv.org/abs/2204.06862v3
- Date: Thu, 22 Feb 2024 15:29:23 GMT
- Title: An Identity-Preserved Framework for Human Motion Transfer
- Authors: Jingzhe Ma, Xiaoqing Zhang and Shiqi Yu
- Abstract summary: Human motion transfer (HMT) aims to generate a video clip for the target subject by imitating the source subject's motion.
Previous methods have achieved good results in good-quality videos, but lose sight of individualized motion information from the source and target motions.
We propose a novel identity-preserved HMT network, termed textitIDPres.
- Score: 3.6286856791379463
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human motion transfer (HMT) aims to generate a video clip for the target
subject by imitating the source subject's motion. Although previous methods
have achieved good results in synthesizing good-quality videos, they lose sight
of individualized motion information from the source and target motions, which
is significant for the realism of the motion in the generated video. To address
this problem, we propose a novel identity-preserved HMT network, termed
\textit{IDPres}. This network is a skeleton-based approach that uniquely
incorporates the target's individualized motion and skeleton information to
augment identity representations. This integration significantly enhances the
realism of movements in the generated videos. Our method focuses on the
fine-grained disentanglement and synthesis of motion. To improve the
representation learning capability in latent space and facilitate the training
of \textit{IDPres}, we introduce three training schemes. These schemes enable
\textit{IDPres} to concurrently disentangle different representations and
accurately control them, ensuring the synthesis of ideal motions. To evaluate
the proportion of individualized motion information in the generated video, we
are the first to introduce a new quantitative metric called Identity Score
(\textit{ID-Score}), motivated by the success of gait recognition methods in
capturing identity information. Moreover, we collect an identity-motion paired
dataset, $Dancer101$, consisting of solo-dance videos of 101 subjects from the
public domain, providing a benchmark to prompt the development of HMT methods.
Extensive experiments demonstrate that the proposed \textit{IDPres} method
surpasses existing state-of-the-art techniques in terms of reconstruction
accuracy, realistic motion, and identity preservation.
Related papers
- SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers [30.06494915665044]
We present SkyReels-A1, a framework built upon video diffusion Transformer to facilitate portrait image animation.
SkyReels-A1 capitalizes on the strong generative capabilities of video DiT, enhancing facial motion transfer precision, identity retention, and temporal coherence.
It is highly applicable to domains such as virtual avatars, remote communication, and digital media generation.
arXiv Detail & Related papers (2025-02-15T16:08:40Z) - Learning Semantic Facial Descriptors for Accurate Face Animation [43.370084532812044]
We introduce the semantic facial descriptors in learnable disentangled vector space to address the dilemma.
We obtain basis vector coefficients by employing an encoder on the source and driving faces, leading to effective facial descriptors in the identity and motion subspaces.
Our approach successfully addresses the issue of model-based methods' limitations in high-fidelity identity and the challenges faced by model-free methods in accurate motion transfer.
arXiv Detail & Related papers (2025-01-29T15:40:42Z) - Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation [52.337472185022136]
We consider the task of Image-to-Video (I2V) generation, which involves transforming static images into realistic video sequences based on a textual description.
We propose a two-stage compositional framework that decomposes I2V generation into: (i) An explicit intermediate representation generation stage, followed by (ii) A video generation stage that is conditioned on this representation.
We evaluate our method on challenging benchmarks with multi-object and high-motion scenarios and empirically demonstrate that the proposed method achieves state-of-the-art consistency.
arXiv Detail & Related papers (2025-01-06T14:49:26Z) - MotionCharacter: Identity-Preserving and Motion Controllable Human Video Generation [7.474418338825595]
MotionCharacter is an efficient and high-fidelity human video generation framework.
We introduce an ID-preserving module to maintain identity fidelity while allowing flexible attribute modifications.
We also introduce ID-consistency and region-aware loss mechanisms, significantly enhancing identity consistency and detail fidelity.
arXiv Detail & Related papers (2024-11-27T12:15:52Z) - MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes [74.82911268630463]
Talking face generation (TFG) aims to animate a target identity's face to create realistic talking videos.
MimicTalk exploits the rich knowledge from a NeRF-based person-agnostic generic model for improving the efficiency and robustness of personalized TFG.
Experiments show that our MimicTalk surpasses previous baselines regarding video quality, efficiency, and expressiveness.
arXiv Detail & Related papers (2024-10-09T10:12:37Z) - Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs [67.27840327499625]
We present a multimodal learning-based method to simultaneously synthesize co-speech facial expressions and upper-body gestures for digital characters.
Our approach learns from sparse face landmarks and upper-body joints, estimated directly from video data, to generate plausible emotive character motions.
arXiv Detail & Related papers (2024-06-26T04:53:11Z) - Disentangling Foreground and Background Motion for Enhanced Realism in Human Video Generation [15.569467643817447]
We introduce a technique that concurrently learns both foreground and background dynamics by segregating their movements using distinct motion representations.
We train on real-world videos enhanced with this innovative motion depiction approach.
To further extend video generation to longer sequences without accumulating errors, we adopt a clip-by-clip generation strategy.
arXiv Detail & Related papers (2024-05-26T00:53:26Z) - AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding [24.486705010561067]
The paper introduces AniTalker, a framework designed to generate lifelike talking faces from a single portrait.
AniTalker effectively captures a wide range of facial dynamics, including subtle expressions and head movements.
arXiv Detail & Related papers (2024-05-06T02:32:41Z) - Customizing Motion in Text-to-Video Diffusion Models [79.4121510826141]
We introduce an approach for augmenting text-to-video generation models with customized motions.
By leveraging a few video samples demonstrating specific movements as input, our method learns and generalizes the input motion patterns for diverse, text-specified scenarios.
arXiv Detail & Related papers (2023-12-07T18:59:03Z) - SemanticBoost: Elevating Motion Generation with Augmented Textual Cues [73.83255805408126]
Our framework comprises a Semantic Enhancement module and a Context-Attuned Motion Denoiser (CAMD)
The CAMD approach provides an all-encompassing solution for generating high-quality, semantically consistent motion sequences.
Our experimental results demonstrate that SemanticBoost, as a diffusion-based method, outperforms auto-regressive-based techniques.
arXiv Detail & Related papers (2023-10-31T09:58:11Z) - Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation.
M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse.
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z) - LEO: Generative Latent Image Animator for Human Video Synthesis [38.99490968487773]
We propose a novel framework for human video synthesis, placing emphasis on synthesizing-temporal coherency.
Our key idea is to represent motion as a sequence of flow maps in the generation process, which inherently isolate motion from appearance.
We implement this idea via a flow-based image animator and a Latent Motion Diffusion Model (LMDM)
arXiv Detail & Related papers (2023-05-06T09:29:12Z) - Flow Guided Transformable Bottleneck Networks for Motion Retargeting [29.16125343915916]
Existing efforts leverage a long training video from each target person to train a subject-specific motion transfer model.
Few-shot motion transfer techniques, which only require one or a few images from a target, have recently drawn considerable attention.
Inspired by the Transformable Bottleneck Network, we propose an approach based on an implicit volumetric representation of the image content.
arXiv Detail & Related papers (2021-06-14T21:58:30Z) - Hierarchical Style-based Networks for Motion Synthesis [150.226137503563]
We propose a self-supervised method for generating long-range, diverse and plausible behaviors to achieve a specific goal location.
Our proposed method learns to model the motion of human by decomposing a long-range generation task in a hierarchical manner.
On large-scale skeleton dataset, we show that the proposed method is able to synthesise long-range, diverse and plausible motion.
arXiv Detail & Related papers (2020-08-24T02:11:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.