Related papers: FitDiff: Robust monocular 3D facial shape and reflectance estimation using Diffusion Models

FitDiff: Robust monocular 3D facial shape and reflectance estimation using Diffusion Models

URL: http://arxiv.org/abs/2312.04465v2
Date: Tue, 4 Jun 2024 11:08:25 GMT
Title: FitDiff: Robust monocular 3D facial shape and reflectance estimation using Diffusion Models
Authors: Stathis Galanakis, Alexandros Lattas, Stylianos Moschoglou, Stefanos Zafeiriou,
Abstract summary: We present FitDiff, a diffusion-based 3D facial avatar generative model. Our model accurately generates relightable facial avatars, utilizing an identity embedding extracted from an "in-the-wild" 2D facial image. Being the first 3D LDM conditioned on face recognition embeddings, FitDiff reconstructs relightable human avatars, that can be used as-is in common rendering engines.
Score: 79.65289816077629
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The remarkable progress in 3D face reconstruction has resulted in high-detail and photorealistic facial representations. Recently, Diffusion Models have revolutionized the capabilities of generative methods by surpassing the performance of GANs. In this work, we present FitDiff, a diffusion-based 3D facial avatar generative model. Leveraging diffusion principles, our model accurately generates relightable facial avatars, utilizing an identity embedding extracted from an "in-the-wild" 2D facial image. The introduced multi-modal diffusion model is the first to concurrently output facial reflectance maps (diffuse and specular albedo and normals) and shapes, showcasing great generalization capabilities. It is solely trained on an annotated subset of a public facial dataset, paired with 3D reconstructions. We revisit the typical 3D facial fitting approach by guiding a reverse diffusion process using perceptual and face recognition losses. Being the first 3D LDM conditioned on face recognition embeddings, FitDiff reconstructs relightable human avatars, that can be used as-is in common rendering engines, starting only from an unconstrained facial image, and achieving state-of-the-art performance.

Related papers

OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration. We propose OSDFace, a novel one-step diffusion model for face restoration. Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z)
3D Priors-Guided Diffusion for Blind Face Restoration [30.94188504133298]
Blind face restoration endeavors to restore a clear face image from a degraded counterpart. Recent approaches employing Generative Adversarial Networks (GANs) as priors have demonstrated remarkable success. We propose a novel diffusion-based framework by embedding the 3D facial priors as structure and identity constraints into a denoising diffusion process.
arXiv Detail & Related papers (2024-09-02T07:13:32Z)
AnimateMe: 4D Facial Expressions via Diffusion Models [72.63383191654357]
Recent advances in diffusion models have enhanced the capabilities of generative models in 2D animation. We employ Graph Neural Networks (GNNs) as denoising diffusion models in a novel approach, formulating the diffusion process directly on the mesh space. This facilitates the generation of facial deformations through a mesh-diffusion-based model.
arXiv Detail & Related papers (2024-03-25T21:40:44Z)
A Generative Framework for Self-Supervised Facial Representation Learning [18.094262972295702]
Self-supervised representation learning has gained increasing attention for strong generalization ability without relying on paired datasets. Self-supervised facial representation learning remains unsolved due to the coupling of facial identities, expressions, and external factors like pose and light. We propose LatentFace, a novel generative framework for self-supervised facial representations.
arXiv Detail & Related papers (2023-09-15T09:34:05Z)
Chupa: Carving 3D Clothed Humans from Skinned Shape Priors using 2D Diffusion Probabilistic Models [9.479195068754507]
We propose a 3D generation pipeline that uses diffusion models to generate realistic human digital avatars. Our method, namely, Chupa, is capable of generating realistic 3D clothed humans with better perceptual quality and identity variety.
arXiv Detail & Related papers (2023-05-19T17:59:18Z)
DiffusionRig: Learning Personalized Priors for Facial Appearance Editing [29.967273146028177]
DiffusionRig is a diffusion model conditioned on, or "rigged by," crude 3D face models. It learns to map simplistic renderings of 3D face models to realistic photos of a given person. By learning the CGI-to-photo mapping with such personalized priors, DiffusionRig can "rig" the lighting, facial expression, head pose, etc.
arXiv Detail & Related papers (2023-04-13T17:58:00Z)
3DMM-RF: Convolutional Radiance Fields for 3D Face Modeling [111.98096975078158]
We introduce a style-based generative network that synthesizes in one pass all and only the required rendering samples of a neural radiance field. We show that this model can accurately be fit to "in-the-wild" facial images of arbitrary pose and illumination, extract the facial characteristics, and be used to re-render the face in controllable conditions.
arXiv Detail & Related papers (2022-09-15T15:28:45Z)
AvatarMe++: Facial Shape and BRDF Inference with Photorealistic Rendering-Aware GANs [119.23922747230193]
We introduce the first method that is able to reconstruct render-ready 3D facial geometry and BRDF from a single "in-the-wild" image. Our method outperforms the existing arts by a significant margin and reconstructs high-resolution 3D faces from a single low-resolution image.
arXiv Detail & Related papers (2021-12-11T11:36:30Z)
DeepFaceFlow: In-the-wild Dense 3D Facial Motion Estimation [56.56575063461169]
DeepFaceFlow is a robust, fast, and highly-accurate framework for the estimation of 3D non-rigid facial flow. Our framework was trained and tested on two very large-scale facial video datasets. Given registered pairs of images, our framework generates 3D flow maps at 60 fps.
arXiv Detail & Related papers (2020-05-14T23:56:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.