Speech Fusion to Face: Bridging the Gap Between Human's Vocal
Characteristics and Facial Imaging
- URL: http://arxiv.org/abs/2006.05888v1
- Date: Wed, 10 Jun 2020 15:19:31 GMT
- Title: Speech Fusion to Face: Bridging the Gap Between Human's Vocal
Characteristics and Facial Imaging
- Authors: Yeqi Bai, Tao Ma, Lipo Wang, Zhenjie Zhang
- Abstract summary: Facial image generation based on vocal characteristics from speech is one of such important yet challenging tasks.
Existing solutions to the problem of speech2face renders limited image quality and fails to preserve facial similarity.
We propose Speech Fusion to Face, or SF2F, attempting to address the issue of facial image quality and the poor connection between vocal feature domain and modern image generation models.
- Score: 19.285149134711382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While deep learning technologies are now capable of generating realistic
images confusing humans, the research efforts are turning to the synthesis of
images for more concrete and application-specific purposes. Facial image
generation based on vocal characteristics from speech is one of such important
yet challenging tasks. It is the key enabler to influential use cases of image
generation, especially for business in public security and entertainment.
Existing solutions to the problem of speech2face renders limited image quality
and fails to preserve facial similarity due to the lack of quality dataset for
training and appropriate integration of vocal features. In this paper, we
investigate these key technical challenges and propose Speech Fusion to Face,
or SF2F in short, attempting to address the issue of facial image quality and
the poor connection between vocal feature domain and modern image generation
models. By adopting new strategies on data model and training, we demonstrate
dramatic performance boost over state-of-the-art solution, by doubling the
recall of individual identity, and lifting the quality score from 15 to 19
based on the mutual information score with VGGFace classifier.
Related papers
- RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network [48.95833484103569]
RealTalk is an audio-to-expression transformer and a high-fidelity expression-to-face framework.
In the first component, we consider both identity and intra-personal variation features related to speaking lip movements.
In the second component, we design a lightweight facial identity alignment (FIA) module.
This novel design allows us to generate fine details in real-time, without depending on sophisticated and inefficient feature alignment modules.
arXiv Detail & Related papers (2024-06-26T12:09:59Z) - Anonymization Prompt Learning for Facial Privacy-Preserving Text-to-Image Generation [56.46932751058042]
We train a learnable prompt prefix for text-to-image diffusion models, which forces the model to generate anonymized facial identities.
Experiments demonstrate the successful anonymization performance of APL, which anonymizes any specific individuals without compromising the quality of non-identity-specific image generation.
arXiv Detail & Related papers (2024-05-27T07:38:26Z) - Adversarial Identity Injection for Semantic Face Image Synthesis [6.763801424109435]
We present an SIS architecture that exploits a cross-attention mechanism to merge identity, style, and semantic features to generate faces.
Experimental results reveal that the proposed method is not only suitable for preserving the identity but is also effective in the face recognition adversarial attack.
arXiv Detail & Related papers (2024-04-16T09:19:23Z) - FlashFace: Human Image Personalization with High-fidelity Identity Preservation [59.76645602354481]
FlashFace allows users to easily personalize their own photos by providing one or a few reference face images and a text prompt.
Our approach is distinguishable from existing human photo customization methods by higher-fidelity identity preservation and better instruction following.
arXiv Detail & Related papers (2024-03-25T17:59:57Z) - FaceStudio: Put Your Face Everywhere in Seconds [23.381791316305332]
Identity-preserving image synthesis seeks to maintain a subject's identity while adding a personalized, stylistic touch.
Traditional methods, such as Textual Inversion and DreamBooth, have made strides in custom image creation.
Our research introduces a novel approach to identity-preserving synthesis, with a particular focus on human images.
arXiv Detail & Related papers (2023-12-05T11:02:45Z) - Effective Adapter for Face Recognition in the Wild [72.75516495170199]
We tackle the challenge of face recognition in the wild, where images often suffer from low quality and real-world distortions.
Traditional approaches-either training models directly on degraded images or their enhanced counterparts using face restoration techniques-have proven ineffective.
We propose an effective adapter for augmenting existing face recognition models trained on high-quality facial datasets.
arXiv Detail & Related papers (2023-12-04T08:55:46Z) - Realistic Speech-to-Face Generation with Speech-Conditioned Latent
Diffusion Model with Face Prior [13.198105709331617]
We propose a novel speech-to-face generation framework, which leverages a Speech-Conditioned Latent Diffusion Model, called SCLDM.
This is the first work to harness the exceptional modeling capabilities of diffusion models for speech-to-face generation.
We show that our method can produce more realistic face images while preserving the identity of the speaker better than state-of-the-art methods.
arXiv Detail & Related papers (2023-10-05T07:44:49Z) - FaceChain: A Playground for Human-centric Artificial Intelligence
Generated Content [36.48960592782015]
FaceChain is a personalized portrait generation framework that combines a series of customized image-generation model and a rich set of face-related perceptual understanding models.
We inject several SOTA face models into the generation procedure, achieving a more efficient label-tagging, data-processing, and model post-processing compared to previous solutions.
Based on FaceChain, we further develop several applications to build a broader playground for better showing its value, including virtual try on and 2D talking head.
arXiv Detail & Related papers (2023-08-28T02:20:44Z) - SynFace: Face Recognition with Synthetic Data [83.15838126703719]
We devise the SynFace with identity mixup (IM) and domain mixup (DM) to mitigate the performance gap.
We also perform a systematically empirical analysis on synthetic face images to provide some insights on how to effectively utilize synthetic data for face recognition.
arXiv Detail & Related papers (2021-08-18T03:41:54Z) - Network Architecture Search for Face Enhancement [82.25775020564654]
We present a multi-task face restoration network, called Network Architecture Search for Face Enhancement (NASFE)
NASFE can enhance poor quality face images containing a single degradation (i.e. noise or blur) or multiple degradations (noise+blur+low-light)
arXiv Detail & Related papers (2021-05-13T19:46:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.