Rethinking Voice-Face Correlation: A Geometry View
- URL: http://arxiv.org/abs/2307.13948v1
- Date: Wed, 26 Jul 2023 04:03:10 GMT
- Title: Rethinking Voice-Face Correlation: A Geometry View
- Authors: Xiang Li, Yandong Wen, Muqiao Yang, Jinglu Wang, Rita Singh, Bhiksha
Raj
- Abstract summary: We propose a voice-anthropometric measurement (AM)-face paradigm, which identifies predictable facial AMs from the voice and uses them to guide 3D face reconstruction.
We find significant correlations between voice and specific parts of the face geometry, such as the nasal cavity and cranium.
- Score: 34.94679112707095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous works on voice-face matching and voice-guided face synthesis
demonstrate strong correlations between voice and face, but mainly rely on
coarse semantic cues such as gender, age, and emotion. In this paper, we aim to
investigate the capability of reconstructing the 3D facial shape from voice
from a geometry perspective without any semantic information. We propose a
voice-anthropometric measurement (AM)-face paradigm, which identifies
predictable facial AMs from the voice and uses them to guide 3D face
reconstruction. By leveraging AMs as a proxy to link the voice and face
geometry, we can eliminate the influence of unpredictable AMs and make the face
geometry tractable. Our approach is evaluated on our proposed dataset with
ground-truth 3D face scans and corresponding voice recordings, and we find
significant correlations between voice and specific parts of the face geometry,
such as the nasal cavity and cranium. Our work offers a new perspective on
voice-face correlation and can serve as a good empirical study for
anthropometry science.
Related papers
- GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained
3D Face Guidance [83.43852715997596]
GSmoothFace is a novel two-stage generalized talking face generation model guided by a fine-grained 3d face model.
It can synthesize smooth lip dynamics while preserving the speaker's identity.
Both quantitative and qualitative experiments confirm the superiority of our method in terms of realism, lip synchronization, and visual quality.
arXiv Detail & Related papers (2023-12-12T16:00:55Z) - Let's Get the FACS Straight -- Reconstructing Obstructed Facial Features [5.7843271011811614]
We propose to reconstruct obstructed facial parts to avoid the task of repeated fine-tuning.
By using the CycleGAN architecture the requirement of matched pairs, which is often hard to fullfill, can be eliminated.
We show, that scores similar to the videos without obstructing sensors can be achieved.
arXiv Detail & Related papers (2023-11-09T09:09:20Z) - DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with
Diffusion [68.85904927374165]
We propose DF-3DFace, a diffusion-driven speech-to-3D face mesh synthesis.
It captures the complex one-to-many relationships between speech and 3D face based on diffusion.
It simultaneously achieves more realistic facial animation than the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-23T04:14:55Z) - The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link
between Phonemes and Facial Features [27.89284938655708]
This work unveils the enigmatic link between phonemes and facial features.
From a physiological perspective, each segment of speech -- phoneme -- corresponds to different types of airflow and movements in the face.
Our results indicate that AMs are more predictable from vowels compared to consonants, particularly with plosives.
arXiv Detail & Related papers (2023-07-26T04:08:12Z) - Parametric Implicit Face Representation for Audio-Driven Facial
Reenactment [52.33618333954383]
We propose a novel audio-driven facial reenactment framework that is both controllable and can generate high-quality talking heads.
Specifically, our parametric implicit representation parameterizes the implicit representation with interpretable parameters of 3D face models.
Our method can generate more realistic results than previous methods with greater fidelity to the identities and talking styles of speakers.
arXiv Detail & Related papers (2023-06-13T07:08:22Z) - EMOCA: Emotion Driven Monocular Face Capture and Animation [59.15004328155593]
We introduce a novel deep perceptual emotion consistency loss during training, which helps ensure that the reconstructed 3D expression matches the expression depicted in the input image.
On the task of in-the-wild emotion recognition, our purely geometric approach is on par with the best image-based methods, highlighting the value of 3D geometry in analyzing human behavior.
arXiv Detail & Related papers (2022-04-24T15:58:35Z) - Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices? [16.716830359688853]
This work digs into a root question in human perception: can face geometry be gleaned from one's voices?
We propose our analysis framework, Cross-Modal Perceptionist, under both supervised and unsupervised learning.
arXiv Detail & Related papers (2022-03-18T10:03:07Z) - Controlled AutoEncoders to Generate Faces from Voices [30.062970046955577]
We propose a framework to morph a target face in response to a given voice in a way that facial features are implicitly guided by learned voice-face correlation.
We evaluate the framework on VoxCelab and VGGFace datasets through human subjects and face retrieval.
arXiv Detail & Related papers (2021-07-16T16:04:29Z) - HifiFace: 3D Shape and Semantic Prior Guided High Fidelity Face Swapping [116.1022638063613]
We propose HifiFace, which can preserve the face shape of the source face and generate photo-realistic results.
We introduce the Semantic Facial Fusion module to optimize the combination of encoder and decoder features.
arXiv Detail & Related papers (2021-06-18T07:39:09Z) - Voice2Mesh: Cross-Modal 3D Face Model Generation from Voices [18.600534152951926]
This work focuses on the analysis that whether 3D face models can be learned from only the speech inputs of speakers.
We propose both the supervised learning and unsupervised learning frameworks. Especially we demonstrate how unsupervised learning is possible in the absence of a direct voice-to-3D-face dataset.
arXiv Detail & Related papers (2021-04-21T01:14:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.