Related papers: Deep Learning Based Facial Retargeting Using Local Patches

Deep Learning Based Facial Retargeting Using Local Patches

URL: http://arxiv.org/abs/2601.08429v1
Date: Tue, 13 Jan 2026 10:56:15 GMT
Title: Deep Learning Based Facial Retargeting Using Local Patches
Authors: Yeonsoo Choi, Inyup Lee, Sihun Cha, Seonghyeon Kim, Sunjin Jung, Junyong Noh,
Abstract summary: We propose a local patch-based method that transfers facial animations captured in a source performance video to a target stylized 3D character.<n>Our method can successfully transfer the semantic meaning of source facial expressions to stylized characters with considerable variations in facial feature proportion.
Score: 14.93331485580316
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the era of digital animation, the quest to produce lifelike facial animations for virtual characters has led to the development of various retargeting methods. While the retargeting facial motion between models of similar shapes has been very successful, challenges arise when the retargeting is performed on stylized or exaggerated 3D characters that deviate significantly from human facial structures. In this scenario, it is important to consider the target character's facial structure and possible range of motion to preserve the semantics assumed by the original facial motions after the retargeting. To achieve this, we propose a local patch-based retargeting method that transfers facial animations captured in a source performance video to a target stylized 3D character. Our method consists of three modules. The Automatic Patch Extraction Module extracts local patches from the source video frame. These patches are processed through the Reenactment Module to generate correspondingly re-enacted target local patches. The Weight Estimation Module calculates the animation parameters for the target character at every frame for the creation of a complete facial animation sequence. Extensive experiments demonstrate that our method can successfully transfer the semantic meaning of source facial expressions to stylized characters with considerable variations in facial feature proportion.

Related papers

Wan-Animate: Unified Character Animation and Replacement with Holistic Replication [53.619006977292635]
We introduce Wan-Animate, a unified framework for character animation and replacement.<n>It can animate the character by precisely replicating the expressions and movements of the character in the video to generate high-fidelity character videos.<n>It can integrate the animated character into the reference video to replace the original character, replicating the scene's lighting and color tone.
arXiv Detail & Related papers (2025-09-17T15:00:57Z)
FaceShot: Bring Any Character into Life [26.99093361595318]
FaceShot is a training-free portrait animation framework designed to bring any character into life from any driven video without fine-tuning or retraining.<n>We achieve this by offering robust landmark sequences from an appearance-guided landmark matching module and a coordinate-based landmark matching module.<n>With this powerful generalization capability, FaceShot can significantly extend the application of portrait animation.
arXiv Detail & Related papers (2025-03-02T05:35:57Z)
Identity-Preserving Pose-Guided Character Animation via Facial Landmarks Transformation [5.591489936998095]
We introduce the Facial Landmarks Transformation () method, which leverages a 3D Morphable Model to address this limitation.<n> converts 2D landmarks into a 3D face model, adjusts the 3D face model to align with the reference identity, and then transforms them back into 2D landmarks.<n>This approach ensures accurate alignment with reference facial geometry, enhancing the consistency between generated videos and reference images.
arXiv Detail & Related papers (2024-12-12T06:13:32Z)
Replace Anyone in Videos [82.37852750357331]
We present the ReplaceAnyone framework, which focuses on localized human replacement and insertion featuring intricate backgrounds.<n>We formulate this task as an image-conditioned video inpainting paradigm with pose guidance, utilizing a unified end-to-end video diffusion architecture.<n>The proposed ReplaceAnyone can be seamlessly applied not only to traditional 3D-UNet base models but also to DiT-based video models such as Wan2.1.
arXiv Detail & Related papers (2024-09-30T03:27:33Z)
FreeAvatar: Robust 3D Facial Animation Transfer by Learning an Expression Foundation Model [45.0201701977516]
Video-driven 3D facial animation transfer aims to drive avatars to reproduce the expressions of actors. We propose FreeAvatar, a robust facial animation transfer method that relies solely on our learned expression representation.
arXiv Detail & Related papers (2024-09-20T03:17:01Z)
DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion [68.85904927374165]
We propose DF-3DFace, a diffusion-driven speech-to-3D face mesh synthesis. It captures the complex one-to-many relationships between speech and 3D face based on diffusion. It simultaneously achieves more realistic facial animation than the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-23T04:14:55Z)
Versatile Face Animator: Driving Arbitrary 3D Facial Avatar in RGBD Space [38.940128217895115]
We propose Versatile Face Animator, which combines facial motion capture with motion in an end-to-end manner, eliminating the need for blendshapes or rigs. Our method has the following two main characteristics: 1) we propose an RGBD animation module to learn facial motion from raw RGBD videos by hierarchical motion dictionaries and animate RGBD images rendered from 3D facial mesh coarse-to-fine, enabling facial animation on arbitrary 3D characters. Comprehensive experiments demonstrate the effectiveness of our proposed framework in generating impressive 3D facial animation results.
arXiv Detail & Related papers (2023-08-11T11:29:01Z)
Imitator: Personalized Speech-driven 3D Facial Animation [63.57811510502906]
State-of-the-art methods deform the face topology of the target actor to sync the input audio without considering the identity-specific speaking style and facial idiosyncrasies of the target actor. We present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video. We show that our approach produces temporally coherent facial expressions from input audio while preserving the speaking style of the target actors.
arXiv Detail & Related papers (2022-12-30T19:00:02Z)
MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement [142.9900055577252]
We propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face. Our approach ensures highly accurate lip motion, while also plausible animation of the parts of the face that are uncorrelated to the audio signal, such as eye blinks and eye brow motion.
arXiv Detail & Related papers (2021-04-16T17:05:40Z)
Deep Spatial Transformation for Pose-Guided Person Image Generation and Animation [50.10989443332995]
Pose-guided person image generation and animation aim to transform a source person image to target poses. Convolutional Neural Networks are limited by the lack of ability to spatially transform the inputs. We propose a differentiable global-flow local-attention framework to reassemble the inputs at the feature level.
arXiv Detail & Related papers (2020-08-27T08:59:44Z)
A Robust Interactive Facial Animation Editing System [0.0]
We propose a new learning-based approach to easily edit a facial animation from a set of intuitive control parameters. We use a resolution-preserving fully convolutional neural network that maps control parameters to blendshapes coefficients sequences. The proposed system is robust and can handle coarse, exaggerated edits from non-specialist users.
arXiv Detail & Related papers (2020-07-18T08:31:02Z)
Personalized Face Modeling for Improved Face Reconstruction and Motion Retargeting [22.24046752858929]
We propose an end-to-end framework that jointly learns a personalized face model per user and per-frame facial motion parameters. Specifically, we learn user-specific expression blendshapes and dynamic (expression-specific) albedo maps by predicting personalized corrections. Experimental results show that our personalization accurately captures fine-grained facial dynamics in a wide range of conditions.
arXiv Detail & Related papers (2020-07-14T01:30:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.