Expression-preserving face frontalization improves visually assisted
speech processing
- URL: http://arxiv.org/abs/2204.02810v2
- Date: Thu, 7 Apr 2022 14:11:10 GMT
- Title: Expression-preserving face frontalization improves visually assisted
speech processing
- Authors: Zhiqi Kang, Mostafa Sadeghi, Radu Horaud and Xavier Alameda-Pineda
- Abstract summary: The main contribution of this paper is a frontalization methodology that preserves non-rigid facial deformations.
We show that the method, when incorporated into deep learning pipelines, improves word recognition and speech intelligibilty scores by a considerable margin.
- Score: 35.647888055229956
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Face frontalization consists of synthesizing a frontally-viewed face from an
arbitrarily-viewed one. The main contribution of this paper is a frontalization
methodology that preserves non-rigid facial deformations in order to boost the
performance of visually assisted speech communication. The method alternates
between the estimation of (i)~the rigid transformation (scale, rotation, and
translation) and (ii)~the non-rigid deformation between an arbitrarily-viewed
face and a face model. The method has two important merits: it can deal with
non-Gaussian errors in the data and it incorporates a dynamical face
deformation model. For that purpose, we use the generalized Student
t-distribution in combination with a linear dynamic system in order to account
for both rigid head motions and time-varying facial deformations caused by
speech production. We propose to use the zero-mean normalized cross-correlation
(ZNCC) score to evaluate the ability of the method to preserve facial
expressions. The method is thoroughly evaluated and compared with several state
of the art methods, either based on traditional geometric models or on deep
learning. Moreover, we show that the method, when incorporated into deep
learning pipelines, namely lip reading and speech enhancement, improves word
recognition and speech intelligibilty scores by a considerable margin.
Supplemental material is accessible at
https://team.inria.fr/robotlearn/research/facefrontalization-benchmark/
Related papers
- OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration.
We propose OSDFace, a novel one-step diffusion model for face restoration.
Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z) - Knowledge-Enhanced Facial Expression Recognition with Emotional-to-Neutral Transformation [66.53435569574135]
Existing facial expression recognition methods typically fine-tune a pre-trained visual encoder using discrete labels.
We observe that the rich knowledge in text embeddings, generated by vision-language models, is a promising alternative for learning discriminative facial expression representations.
We propose a novel knowledge-enhanced FER method with an emotional-to-neutral transformation.
arXiv Detail & Related papers (2024-09-13T07:28:57Z) - High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model [89.29655924125461]
We propose a novel landmark-based diffusion model for talking face generation.
We first establish the less ambiguous mapping from audio to landmark motion of lip and jaw.
Then, we introduce an innovative conditioning module called TalkFormer to align the synthesized motion with the motion represented by landmarks.
arXiv Detail & Related papers (2024-08-10T02:58:28Z) - TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting [21.474938045227702]
We introduce TalkingGaussian, a deformation-based radiance fields framework for high-fidelity talking head synthesis.
Our method renders high-quality lip-synchronized talking head videos, with better facial fidelity and higher efficiency compared with previous methods.
arXiv Detail & Related papers (2024-04-23T17:55:07Z) - eMotion-GAN: A Motion-based GAN for Photorealistic and Facial Expression Preserving Frontal View Synthesis [3.2498796510544636]
We present eMotion-GAN, a novel deep learning approach designed for frontal view synthesis.
Considering the motion induced by head variation as noise and the motion induced by facial expression as the relevant information, our model is trained to filter out the noisy motion.
The filtered motion is then mapped onto a neutral frontal face to generate the corresponding expressive frontal face.
arXiv Detail & Related papers (2024-04-15T17:08:53Z) - Realistic Speech-to-Face Generation with Speech-Conditioned Latent
Diffusion Model with Face Prior [13.198105709331617]
We propose a novel speech-to-face generation framework, which leverages a Speech-Conditioned Latent Diffusion Model, called SCLDM.
This is the first work to harness the exceptional modeling capabilities of diffusion models for speech-to-face generation.
We show that our method can produce more realistic face images while preserving the identity of the speaker better than state-of-the-art methods.
arXiv Detail & Related papers (2023-10-05T07:44:49Z) - GaFET: Learning Geometry-aware Facial Expression Translation from
In-The-Wild Images [55.431697263581626]
We introduce a novel Geometry-aware Facial Expression Translation framework, which is based on parametric 3D facial representations and can stably decoupled expression.
We achieve higher-quality and more accurate facial expression transfer results compared to state-of-the-art methods, and demonstrate applicability of various poses and complex textures.
arXiv Detail & Related papers (2023-08-07T09:03:35Z) - A survey on facial image deblurring [3.6775758132528877]
When the facial image is blurred, it has a great impact on high-level vision tasks such as face recognition.
This paper surveys and summarizes recently published methods for facial image deblurring, most of which are based on deep learning.
We show the performance of classical methods on datasets and metrics and give a brief discussion on the differences of model-based and learning-based methods.
arXiv Detail & Related papers (2023-02-10T02:24:56Z) - Face Frontalization Based on Robustly Fitting a Deformable Shape Model
to 3D Landmarks [24.07648367866321]
Face frontalization consists of a frontally-viewed face from an arbitrarily-viewed one.
The main contribution of this paper is a robust face alignment method that enables pixel-to-pixel warping.
An important merit of the proposed method is its ability to deal both with noise (small perturbations) and with outliers (large errors)
arXiv Detail & Related papers (2020-10-26T15:52:50Z) - Unsupervised Learning Facial Parameter Regressor for Action Unit
Intensity Estimation via Differentiable Renderer [51.926868759681014]
We present a framework to predict the facial parameters based on a bone-driven face model (BDFM) under different views.
The proposed framework consists of a feature extractor, a generator, and a facial parameter regressor.
arXiv Detail & Related papers (2020-08-20T09:49:13Z) - InterFaceGAN: Interpreting the Disentangled Face Representation Learned
by GANs [73.27299786083424]
We propose a framework called InterFaceGAN to interpret the disentangled face representation learned by state-of-the-art GAN models.
We first find that GANs learn various semantics in some linear subspaces of the latent space.
We then conduct a detailed study on the correlation between different semantics and manage to better disentangle them via subspace projection.
arXiv Detail & Related papers (2020-05-18T18:01:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.