Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation
- URL: http://arxiv.org/abs/2406.01900v3
- Date: Fri, 7 Jun 2024 02:57:36 GMT
- Title: Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation
- Authors: Yue Ma, Hongyu Liu, Hongfa Wang, Heng Pan, Yingqing He, Junkun Yuan, Ailing Zeng, Chengfei Cai, Heung-Yeung Shum, Wei Liu, Qifeng Chen,
- Abstract summary: Follow-Your-Emoji is a diffusion-based framework for portrait animation.
It animates a reference portrait with target landmark sequences.
Our method demonstrates significant performance in controlling the expression of freestyle portraits.
- Score: 53.767090490974745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Follow-Your-Emoji, a diffusion-based framework for portrait animation, which animates a reference portrait with target landmark sequences. The main challenge of portrait animation is to preserve the identity of the reference portrait and transfer the target expression to this portrait while maintaining temporal consistency and fidelity. To address these challenges, Follow-Your-Emoji equipped the powerful Stable Diffusion model with two well-designed technologies. Specifically, we first adopt a new explicit motion signal, namely expression-aware landmark, to guide the animation process. We discover this landmark can not only ensure the accurate motion alignment between the reference portrait and target motion during inference but also increase the ability to portray exaggerated expressions (i.e., large pupil movements) and avoid identity leakage. Then, we propose a facial fine-grained loss to improve the model's ability of subtle expression perception and reference portrait appearance reconstruction by using both expression and facial masks. Accordingly, our method demonstrates significant performance in controlling the expression of freestyle portraits, including real humans, cartoons, sculptures, and even animals. By leveraging a simple and effective progressive generation strategy, we extend our model to stable long-term animation, thus increasing its potential application value. To address the lack of a benchmark for this field, we introduce EmojiBench, a comprehensive benchmark comprising diverse portrait images, driving videos, and landmarks. We show extensive evaluations on EmojiBench to verify the superiority of Follow-Your-Emoji.
Related papers
- GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations [54.94362657501809]
We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time.
At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements.
We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework.
arXiv Detail & Related papers (2024-09-18T13:05:43Z) - AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation [4.568539181254851]
We propose AniPortrait, a framework for generating high-quality animation driven by audio and a reference portrait image.
Experimental results demonstrate the superiority of AniPortrait in terms of facial naturalness, pose diversity, and visual quality.
Our methodology exhibits considerable potential in terms of flexibility and controllability, which can be effectively applied in areas such as facial motion editing or face reenactment.
arXiv Detail & Related papers (2024-03-26T13:35:02Z) - FlashFace: Human Image Personalization with High-fidelity Identity Preservation [59.76645602354481]
FlashFace allows users to easily personalize their own photos by providing one or a few reference face images and a text prompt.
Our approach is distinguishable from existing human photo customization methods by higher-fidelity identity preservation and better instruction following.
arXiv Detail & Related papers (2024-03-25T17:59:57Z) - X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention [18.211762995744337]
We propose X-Portrait, an innovative conditional diffusion model tailored for generating expressive and temporally coherent portrait animation.
Given a single portrait as appearance reference, we aim to animate it with motion derived from a driving video, capturing both highly dynamic and subtle facial expressions.
Experimental results demonstrate the universal effectiveness of X-Portrait across a diverse range of facial portraits and expressive driving sequences.
arXiv Detail & Related papers (2024-03-23T20:30:28Z) - FAAC: Facial Animation Generation with Anchor Frame and Conditional
Control for Superior Fidelity and Editability [14.896554342627551]
We introduce a facial animation generation method that enhances both face identity fidelity and editing capabilities.
This approach incorporates the concept of an anchor frame to counteract the degradation of generative ability in original text-to-image models.
Our method's efficacy has been validated on multiple representative DreamBooth and LoRA models.
arXiv Detail & Related papers (2023-12-06T02:55:35Z) - MagicAnimate: Temporally Consistent Human Image Animation using
Diffusion Model [74.84435399451573]
This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence.
Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion.
We introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
arXiv Detail & Related papers (2023-11-27T18:32:31Z) - AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation [49.4220768835379]
AdaMesh is a novel adaptive speech-driven facial animation approach.
It learns the personalized talking style from a reference video of about 10 seconds.
It generates vivid facial expressions and head poses.
arXiv Detail & Related papers (2023-10-11T06:56:08Z) - FACTS: Facial Animation Creation using the Transfer of Styles [10.141085397402314]
We present a novel approach to facial animation by taking existing animations and allowing for the modification of style characteristics.
Specifically, we explore the use of a StarGAN to enable the conversion of 3D facial animations into different emotions and person-specific styles.
arXiv Detail & Related papers (2023-07-18T17:58:22Z) - Continuously Controllable Facial Expression Editing in Talking Face
Videos [34.83353695337335]
Speech-related expressions and emotion-related expressions are often highly coupled.
Traditional image-to-image translation methods cannot work well in our application.
We propose a high-quality facial expression editing method for talking face videos.
arXiv Detail & Related papers (2022-09-17T09:05:47Z) - Language-Guided Face Animation by Recurrent StyleGAN-based Generator [87.56260982475564]
We study a novel task, language-guided face animation, that aims to animate a static face image with the help of languages.
We propose a recurrent motion generator to extract a series of semantic and motion information from the language and feed it along with visual information to a pre-trained StyleGAN to generate high-quality frames.
arXiv Detail & Related papers (2022-08-11T02:57:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.