Related papers: FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability

FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability

URL: http://arxiv.org/abs/2312.03775v2
Date: Wed, 20 Dec 2023 12:59:33 GMT
Title: FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability
Authors: Linze Li, Sunqi Fan, Hengjun Pu, Zhaodong Bing, Yao Tang, Tianzhu Ye, Tong Yang, Liangyu Chen, Jiajun Liang
Abstract summary: We introduce a facial animation generation method that enhances both face identity fidelity and editing capabilities. This approach incorporates the concept of an anchor frame to counteract the degradation of generative ability in original text-to-image models. Our method's efficacy has been validated on multiple representative DreamBooth and LoRA models.
Score: 14.896554342627551
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Over recent years, diffusion models have facilitated significant advancements in video generation. Yet, the creation of face-related videos still confronts issues such as low facial fidelity, lack of frame consistency, limited editability and uncontrollable human poses. To address these challenges, we introduce a facial animation generation method that enhances both face identity fidelity and editing capabilities while ensuring frame consistency. This approach incorporates the concept of an anchor frame to counteract the degradation of generative ability in original text-to-image models when incorporating a motion module. We propose two strategies towards this objective: training-free and training-based anchor frame methods. Our method's efficacy has been validated on multiple representative DreamBooth and LoRA models, delivering substantial improvements over the original outcomes in terms of facial fidelity, text-to-image editability, and video motion. Moreover, we introduce conditional control using a 3D parametric face model to capture accurate facial movements and expressions. This solution augments the creative possibilities for facial animation generation through the integration of multiple control signals. For additional samples, please visit https://paper-faac.github.io/.

Related papers

SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers [30.06494915665044]
We present SkyReels-A1, a framework built upon video diffusion Transformer to facilitate portrait image animation. SkyReels-A1 capitalizes on the strong generative capabilities of video DiT, enhancing facial motion transfer precision, identity retention, and temporal coherence. It is highly applicable to domains such as virtual avatars, remote communication, and digital media generation.
arXiv Detail & Related papers (2025-02-15T16:08:40Z)
Towards Consistent and Controllable Image Synthesis for Face Editing [18.646961062736207]
RigFace is a novel approach to control the lighting, facial expression and head pose of a portrait photo. Our model achieves comparable or even superior performance in both identity preservation and photorealism compared to existing face editing models.
arXiv Detail & Related papers (2025-02-04T16:36:07Z)
Replace Anyone in Videos [39.4019337319795]
We propose the ReplaceAnyone framework, which focuses on localizing and manipulating human motion in videos. Specifically, we formulate this task as an image-conditioned pose-driven video inpainting paradigm. We introduce diverse mask forms involving regular and irregular shapes to avoid shape leakage and allow granular local control.
arXiv Detail & Related papers (2024-09-30T03:27:33Z)
Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models [69.50286698375386]
We propose a novel approach that better harnesses diffusion models for face-swapping. We introduce a mask shuffling technique during inpainting training, which allows us to create a so-called universal model for swapping. Ours is a relatively unified approach and so it is resilient to errors in other off-the-shelf models.
arXiv Detail & Related papers (2024-09-11T13:43:53Z)
G3FA: Geometry-guided GAN for Face Animation [14.488117084637631]
We introduce Geometry-guided GAN for Face Animation (G3FA) to tackle this limitation. Our novel approach empowers the face animation model to incorporate 3D information using only 2D images. In our face reenactment model, we leverage 2D motion warping to capture motion dynamics.
arXiv Detail & Related papers (2024-08-23T13:13:24Z)
Zero-shot High-fidelity and Pose-controllable Character Animation [89.74818983864832]
Image-to-video (I2V) generation aims to create a video sequence from a single image. Existing approaches suffer from inconsistency of character appearances and poor preservation of fine details. We propose PoseAnimate, a novel zero-shot I2V framework for character animation.
arXiv Detail & Related papers (2024-04-21T14:43:31Z)
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models [40.71940056121056]
We present a novel approach that combines the controllability of dynamic 3D meshes with the expressivity and editability of emerging diffusion models. We demonstrate our approach on various examples where motion can be obtained by animating rigged assets or changing the camera path.
arXiv Detail & Related papers (2023-12-03T14:17:11Z)
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model [74.84435399451573]
This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence. Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion. We introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
arXiv Detail & Related papers (2023-11-27T18:32:31Z)
High-Fidelity and Freely Controllable Talking Head Video Generation [31.08828907637289]
We propose a novel model that produces high-fidelity talking head videos with free control over head pose and expression. We introduce a novel motion-aware multi-scale feature alignment module to effectively transfer the motion without face distortion. We evaluate our model on challenging datasets and demonstrate its state-of-the-art performance.
arXiv Detail & Related papers (2023-04-20T09:02:41Z)
StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pretrained StyleGAN [49.917296433657484]
One-shot talking face generation aims at synthesizing a high-quality talking face video from an arbitrary portrait image. In this work, we investigate the latent feature space of a pre-trained StyleGAN and discover some excellent spatial transformation properties. We propose a novel unified framework based on a pre-trained StyleGAN that enables a set of powerful functionalities.
arXiv Detail & Related papers (2022-03-08T12:06:12Z)
PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering [56.762094966235566]
A Portrait Image Neural Renderer is proposed to control the face motions with the parameters of three-dimensional morphable face models. The proposed model can generate photo-realistic portrait images with accurate movements according to intuitive modifications. Our model can generate coherent videos with convincing movements from only a single reference image and a driving audio stream.
arXiv Detail & Related papers (2021-09-17T07:24:16Z)
UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing [78.26925404508994]
We propose a unified temporally consistent facial video editing framework termed UniFaceGAN. Our framework is designed to handle face swapping and face reenactment simultaneously. Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.
arXiv Detail & Related papers (2021-08-12T10:35:22Z)
Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose [23.211318473026243]
We propose a self-supervised hybrid model (DAE-GAN) that learns how to reenact face naturally given large amounts of unlabeled videos. Our approach combines two deforming autoencoders with the latest advances in the conditional generation. Experiment results demonstrate the superior quality of reenacted images and the flexibility of transferring facial movements between identities.
arXiv Detail & Related papers (2020-03-29T06:45:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.