Motion Manipulation via Unsupervised Keypoint Positioning in Face Animation
- URL: http://arxiv.org/abs/2603.04302v1
- Date: Wed, 04 Mar 2026 17:21:28 GMT
- Title: Motion Manipulation via Unsupervised Keypoint Positioning in Face Animation
- Authors: Hong Li, Boyu Liu, Xuhui Liu, Baochang Zhang,
- Abstract summary: We present Motion Manipulation via unsupervised keypoint positioning in Face Animation (MMFA)<n>We first introduce self-supervised representation learning to encode and decode expressions in the latent feature space and decouple them from other motion information.<n>We design a variational autoencoder to map expression features to a continuous Gaussian distribution, allowing us for the first time to interpolate facial expressions in an unsupervised framework.
- Score: 22.055073645428738
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Face animation deals with controlling and generating facial features with a wide range of applications. The methods based on unsupervised keypoint positioning can produce realistic and detailed virtual portraits. However, they cannot achieve controllable face generation since the existing keypoint decomposition pipelines fail to fully decouple identity semantics and intertwined motion information (e.g., rotation, translation, and expression). To address these issues, we present a new method, Motion Manipulation via unsupervised keypoint positioning in Face Animation (MMFA). We first introduce self-supervised representation learning to encode and decode expressions in the latent feature space and decouple them from other motion information. Secondly, we propose a new way to compute keypoints aiming to achieve arbitrary motion control. Moreover, we design a variational autoencoder to map expression features to a continuous Gaussian distribution, allowing us for the first time to interpolate facial expressions in an unsupervised framework. We have conducted extensive experiments on publicly available datasets to validate the effectiveness of MMFA, which show that MMFA offers pronounced advantages over prior arts in creating realistic animation and manipulating face motion.
Related papers
- IM-Animation: An Implicit Motion Representation for Identity-decoupled Character Animation [58.297199313494]
Implicit methods capture motion semantics directly from driving video, but suffer from identity leakage and entanglement between motion and appearance.<n>We propose a novel implicit motion representation that compresses per-frame motion into compact 1D motion tokens.<n>Our methodology employs a three-stage training strategy to enhance the training efficiency and ensure high fidelity.
arXiv Detail & Related papers (2026-02-07T11:17:20Z) - FactorPortrait: Controllable Portrait Animation via Disentangled Expression, Pose, and Viewpoint [49.80464592726769]
We introduce FactorPortrait, a video diffusion method for controllable portrait animation.<n>Our method animates the portrait by transferring facial expressions and head movements from the driving video.<n>Our method outperforms existing approaches in realism, expressiveness, control accuracy, and view consistency.
arXiv Detail & Related papers (2025-12-12T15:22:52Z) - SMF: Template-free and Rig-free Animation Transfer using Kinetic Codes [32.324844649352166]
Animation retargetting applies sparse motion description to a character mesh to produce a semantically plausible and temporally coherent full-body sequence.<n>We propose Self-supervised Motion Fields (SMF), a self-supervised framework that is trained with only sparse motion representations.<n>Our architecture comprises dedicated spatial and temporal gradient predictors, which are jointly trained in an end-to-end fashion.
arXiv Detail & Related papers (2025-04-07T08:42:52Z) - Multi-Keypoint Affordance Representation for Functional Dexterous Grasping [26.961157077703756]
We propose a multi-keypoint affordance representation for functional dexterous grasping.<n>Our method encodes task-driven grasp configurations by localizing functional contact points.<n>Our method significantly improves affordance localization accuracy, grasp consistency, and generalization to unseen tools and tasks.
arXiv Detail & Related papers (2025-02-27T11:54:53Z) - Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics [79.4785166021062]
We introduce Puppet-Master, an interactive video generator that captures the internal, part-level motion of objects.<n>We demonstrate that Puppet-Master learns to generate part-level motions, unlike other motion-conditioned video generators.<n>Puppet-Master generalizes well to out-of-domain real images, outperforming existing methods on real-world benchmarks.
arXiv Detail & Related papers (2024-08-08T17:59:38Z) - X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention [18.211762995744337]
We propose X-Portrait, an innovative conditional diffusion model tailored for generating expressive and temporally coherent portrait animation.
Given a single portrait as appearance reference, we aim to animate it with motion derived from a driving video, capturing both highly dynamic and subtle facial expressions.
Experimental results demonstrate the universal effectiveness of X-Portrait across a diverse range of facial portraits and expressive driving sequences.
arXiv Detail & Related papers (2024-03-23T20:30:28Z) - Spatial Steerability of GANs via Self-Supervision from Discriminator [123.27117057804732]
We propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space.
Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias.
During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects.
arXiv Detail & Related papers (2023-01-20T07:36:29Z) - Masked Motion Encoding for Self-Supervised Video Representation Learning [84.24773072241945]
We present Masked Motion MME, a new pre-training paradigm that reconstructs both appearance and motion information to explore temporal clues.
Motivated by the fact that human is able to recognize an action by tracking objects' position changes and shape changes, we propose to reconstruct a motion trajectory that represents these two kinds of change in the masked regions.
Pre-trained with our MME paradigm, the model is able to anticipate long-term and fine-grained motion details.
arXiv Detail & Related papers (2022-10-12T11:19:55Z) - Motion Transformer for Unsupervised Image Animation [37.35527776043379]
Image animation aims to animate a source image by using motion learned from a driving video.
Current state-of-the-art methods typically use convolutional neural networks (CNNs) to predict motion information.
We propose a new method, the motion transformer, which is the first attempt to build a motion estimator based on a vision transformer.
arXiv Detail & Related papers (2022-09-28T12:04:58Z) - JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion
Retargeting [53.28477676794658]
unsupervised motion in videos has seen substantial advancements through the use of deep neural networks.
We introduce JOKR - a JOint Keypoint Representation that handles both the source and target videos, without requiring any object prior or data collection.
We evaluate our method both qualitatively and quantitatively, and demonstrate that our method handles various cross-domain scenarios, such as different animals, different flowers, and humans.
arXiv Detail & Related papers (2021-06-17T17:32:32Z) - Dual In-painting Model for Unsupervised Gaze Correction and Animation in
the Wild [82.42401132933462]
We present a solution that works without the need for precise annotations of the gaze angle and the head pose.
Our method consists of three novel modules: the Gaze Correction module (GCM), the Gaze Animation module (GAM), and the Pretrained Autoencoder module (PAM)
arXiv Detail & Related papers (2020-08-09T23:14:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.