Face Animation with an Attribute-Guided Diffusion Model
- URL: http://arxiv.org/abs/2304.03199v1
- Date: Thu, 6 Apr 2023 16:22:32 GMT
- Title: Face Animation with an Attribute-Guided Diffusion Model
- Authors: Bohan Zeng, Xuhui Liu, Sicheng Gao, Boyu Liu, Hong Li, Jianzhuang Liu,
Baochang Zhang
- Abstract summary: We propose a Face Animation framework with an attribute-guided Diffusion Model (FADM)
FADM is first work to exploit the superior modeling capacity of diffusion models for photo-realistic talking-head generation.
- Score: 41.43427420949979
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Face animation has achieved much progress in computer vision. However,
prevailing GAN-based methods suffer from unnatural distortions and artifacts
due to sophisticated motion deformation. In this paper, we propose a Face
Animation framework with an attribute-guided Diffusion Model (FADM), which is
the first work to exploit the superior modeling capacity of diffusion models
for photo-realistic talking-head generation. To mitigate the uncontrollable
synthesis effect of the diffusion model, we design an Attribute-Guided
Conditioning Network (AGCN) to adaptively combine the coarse animation features
and 3D face reconstruction results, which can incorporate appearance and motion
conditions into the diffusion process. These specific designs help FADM rectify
unnatural artifacts and distortions, and also enrich high-fidelity facial
details through iterative diffusion refinements with accurate animation
attributes. FADM can flexibly and effectively improve existing animation
videos. Extensive experiments on widely used talking-head benchmarks validate
the effectiveness of FADM over prior arts.
Related papers
- DiffuEraser: A Diffusion Model for Video Inpainting [13.292164408616257]
We introduce DiffuEraser, a video inpainting model based on stable diffusion, to fill masked regions with greater details and more coherent structures.
We also expand the temporal receptive fields of both the prior model and DiffuEraser, and further enhance consistency by leveraging the temporal smoothing property of Video Diffusion Models.
arXiv Detail & Related papers (2025-01-17T08:03:02Z) - ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer [95.80384464922147]
Continuous visual generation requires the full-sequence diffusion-based approach.
We present ACDiT, an Autoregressive blockwise Conditional Diffusion Transformer.
We demonstrate that ACDiT can be seamlessly used in visual understanding tasks despite being trained on the diffusion objective.
arXiv Detail & Related papers (2024-12-10T18:13:20Z) - Towards motion from video diffusion models [10.493424298717864]
We propose to synthesize human motion by deforming an SMPL-X body representation guided by Score distillation sampling (SDS) calculated using a video diffusion model.
By analyzing the fidelity of the resulting animations, we gain insights into the extent to which we can obtain motion using publicly available text-to-video diffusion models.
arXiv Detail & Related papers (2024-11-19T19:35:28Z) - AnimateMe: 4D Facial Expressions via Diffusion Models [72.63383191654357]
Recent advances in diffusion models have enhanced the capabilities of generative models in 2D animation.
We employ Graph Neural Networks (GNNs) as denoising diffusion models in a novel approach, formulating the diffusion process directly on the mesh space.
This facilitates the generation of facial deformations through a mesh-diffusion-based model.
arXiv Detail & Related papers (2024-03-25T21:40:44Z) - FitDiff: Robust monocular 3D facial shape and reflectance estimation using Diffusion Models [79.65289816077629]
We present FitDiff, a diffusion-based 3D facial avatar generative model.
Our model accurately generates relightable facial avatars, utilizing an identity embedding extracted from an "in-the-wild" 2D facial image.
Being the first 3D LDM conditioned on face recognition embeddings, FitDiff reconstructs relightable human avatars, that can be used as-is in common rendering engines.
arXiv Detail & Related papers (2023-12-07T17:35:49Z) - FAAC: Facial Animation Generation with Anchor Frame and Conditional
Control for Superior Fidelity and Editability [14.896554342627551]
We introduce a facial animation generation method that enhances both face identity fidelity and editing capabilities.
This approach incorporates the concept of an anchor frame to counteract the degradation of generative ability in original text-to-image models.
Our method's efficacy has been validated on multiple representative DreamBooth and LoRA models.
arXiv Detail & Related papers (2023-12-06T02:55:35Z) - Motion-Conditioned Diffusion Model for Controllable Video Synthesis [75.367816656045]
We introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes.
We show that MCDiff achieves the state-the-art visual quality in stroke-guided controllable video synthesis.
arXiv Detail & Related papers (2023-04-27T17:59:32Z) - Diffusion Models as Masked Autoencoders [52.442717717898056]
We revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models.
While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE)
We perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.
arXiv Detail & Related papers (2023-04-06T17:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.