Face Animation with an Attribute-Guided Diffusion Model
- URL: http://arxiv.org/abs/2304.03199v1
- Date: Thu, 6 Apr 2023 16:22:32 GMT
- Title: Face Animation with an Attribute-Guided Diffusion Model
- Authors: Bohan Zeng, Xuhui Liu, Sicheng Gao, Boyu Liu, Hong Li, Jianzhuang Liu,
Baochang Zhang
- Abstract summary: We propose a Face Animation framework with an attribute-guided Diffusion Model (FADM)
FADM is first work to exploit the superior modeling capacity of diffusion models for photo-realistic talking-head generation.
- Score: 41.43427420949979
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Face animation has achieved much progress in computer vision. However,
prevailing GAN-based methods suffer from unnatural distortions and artifacts
due to sophisticated motion deformation. In this paper, we propose a Face
Animation framework with an attribute-guided Diffusion Model (FADM), which is
the first work to exploit the superior modeling capacity of diffusion models
for photo-realistic talking-head generation. To mitigate the uncontrollable
synthesis effect of the diffusion model, we design an Attribute-Guided
Conditioning Network (AGCN) to adaptively combine the coarse animation features
and 3D face reconstruction results, which can incorporate appearance and motion
conditions into the diffusion process. These specific designs help FADM rectify
unnatural artifacts and distortions, and also enrich high-fidelity facial
details through iterative diffusion refinements with accurate animation
attributes. FADM can flexibly and effectively improve existing animation
videos. Extensive experiments on widely used talking-head benchmarks validate
the effectiveness of FADM over prior arts.
Related papers
- Towards motion from video diffusion models [10.493424298717864]
We propose to synthesize human motion by deforming an SMPL-X body representation guided by Score distillation sampling (SDS) calculated using a video diffusion model.
By analyzing the fidelity of the resulting animations, we gain insights into the extent to which we can obtain motion using publicly available text-to-video diffusion models.
arXiv Detail & Related papers (2024-11-19T19:35:28Z) - MotionDreamer: Exploring Semantic Video Diffusion features for Zero-Shot 3D Mesh Animation [10.263762787854862]
We propose a technique for automatic re-animation of various 3D shapes based on a motion prior extracted from a video diffusion model.
We leverage an explicit mesh-based representation compatible with existing computer-graphics pipelines.
Our time-efficient zero-shot method achieves a superior performance re-animating a diverse set of 3D shapes.
arXiv Detail & Related papers (2024-05-30T15:30:38Z) - AnimateMe: 4D Facial Expressions via Diffusion Models [72.63383191654357]
Recent advances in diffusion models have enhanced the capabilities of generative models in 2D animation.
We employ Graph Neural Networks (GNNs) as denoising diffusion models in a novel approach, formulating the diffusion process directly on the mesh space.
This facilitates the generation of facial deformations through a mesh-diffusion-based model.
arXiv Detail & Related papers (2024-03-25T21:40:44Z) - FitDiff: Robust monocular 3D facial shape and reflectance estimation using Diffusion Models [79.65289816077629]
We present FitDiff, a diffusion-based 3D facial avatar generative model.
Our model accurately generates relightable facial avatars, utilizing an identity embedding extracted from an "in-the-wild" 2D facial image.
Being the first 3D LDM conditioned on face recognition embeddings, FitDiff reconstructs relightable human avatars, that can be used as-is in common rendering engines.
arXiv Detail & Related papers (2023-12-07T17:35:49Z) - FAAC: Facial Animation Generation with Anchor Frame and Conditional
Control for Superior Fidelity and Editability [14.896554342627551]
We introduce a facial animation generation method that enhances both face identity fidelity and editing capabilities.
This approach incorporates the concept of an anchor frame to counteract the degradation of generative ability in original text-to-image models.
Our method's efficacy has been validated on multiple representative DreamBooth and LoRA models.
arXiv Detail & Related papers (2023-12-06T02:55:35Z) - GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians [51.46168990249278]
We present an efficient approach to creating realistic human avatars with dynamic 3D appearances from a single video.
GustafAvatar is validated on both the public dataset and our collected dataset.
arXiv Detail & Related papers (2023-12-04T18:55:45Z) - Motion-Conditioned Diffusion Model for Controllable Video Synthesis [75.367816656045]
We introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes.
We show that MCDiff achieves the state-the-art visual quality in stroke-guided controllable video synthesis.
arXiv Detail & Related papers (2023-04-27T17:59:32Z) - Diffusion Models as Masked Autoencoders [52.442717717898056]
We revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models.
While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE)
We perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.
arXiv Detail & Related papers (2023-04-06T17:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.