LaughTalk: Expressive 3D Talking Head Generation with Laughter
- URL: http://arxiv.org/abs/2311.00994v1
- Date: Thu, 2 Nov 2023 05:04:33 GMT
- Title: LaughTalk: Expressive 3D Talking Head Generation with Laughter
- Authors: Kim Sung-Bin, Lee Hyun, Da Hye Hong, Suekyeong Nam, Janghoon Ju,
Tae-Hyun Oh
- Abstract summary: We introduce a novel task to generate 3D talking heads capable of both articulate speech and authentic laughter.
Our newly curated dataset comprises 2D laughing videos paired with pseudo-annotated and human-validated 3D FLAME parameters.
Our method performs favorably compared to existing approaches in both talking head generation and expressing laughter signals.
- Score: 15.60843963655039
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Laughter is a unique expression, essential to affirmative social interactions
of humans. Although current 3D talking head generation methods produce
convincing verbal articulations, they often fail to capture the vitality and
subtleties of laughter and smiles despite their importance in social context.
In this paper, we introduce a novel task to generate 3D talking heads capable
of both articulate speech and authentic laughter. Our newly curated dataset
comprises 2D laughing videos paired with pseudo-annotated and human-validated
3D FLAME parameters and vertices. Given our proposed dataset, we present a
strong baseline with a two-stage training scheme: the model first learns to
talk and then acquires the ability to express laughter. Extensive experiments
demonstrate that our method performs favorably compared to existing approaches
in both talking head generation and expressing laughter signals. We further
explore potential applications on top of our proposed method for rigging
realistic avatars.
Related papers
- EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head [30.138347111341748]
We present a novel approach for synthesizing 3D talking heads with controllable emotion.
Our model enables controllable emotion in the generated talking heads and can be rendered in wide-range views.
Experiments demonstrate the effectiveness of our approach in generating high-fidelity and emotion-controllable 3D talking heads.
arXiv Detail & Related papers (2024-08-01T05:46:57Z) - EmoVOCA: Speech-Driven Emotional 3D Talking Heads [12.161006152509653]
We propose an innovative data-driven technique for creating a synthetic dataset, called EmoVOCA.
We then designed and trained an emotional 3D talking head generator that accepts a 3D face, an audio file, an emotion label, and an intensity value as inputs, and learns to animate the audio-synchronized lip movements with expressive traits of the face.
arXiv Detail & Related papers (2024-03-19T16:33:26Z) - AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D
Talking Face Generation [28.71632683090641]
We propose an Audio-Visual Instruction system for expressive Talking face generation.
Instead of directly learning facial movements from human speech, our two-stage strategy involves the LLMs first comprehending audio information.
This two-stage process, coupled with the incorporation of LLMs, enhances model interpretability and provides users with flexibility to comprehend instructions.
arXiv Detail & Related papers (2024-02-25T15:51:05Z) - Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like [49.2096391012794]
ELaTE is a zero-shot TTS that can generate natural laughing speech of any speaker based on a short audio prompt.
We develop our model based on the foundation of conditional flow-matching-based zero-shot TTS.
We show that ELaTE can generate laughing speech with significantly higher quality and controllability compared to conventional models.
arXiv Detail & Related papers (2024-02-12T02:58:10Z) - FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models [85.16273912625022]
We introduce FaceTalk, a novel generative approach designed for synthesizing high-fidelity 3D motion sequences of talking human heads from audio signal.
To the best of our knowledge, this is the first work to propose a generative approach for realistic and high-quality motion synthesis of human heads.
arXiv Detail & Related papers (2023-12-13T19:01:07Z) - Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis [66.43223397997559]
We aim to synthesize high-quality talking portrait videos corresponding to the input text.
This task has broad application prospects in the digital human industry but has not been technically achieved yet.
We introduce Adaptive Text-to-Talking Avatar (Ada-TTA), which designs a generic zero-shot multi-speaker Text-to-Speech model.
arXiv Detail & Related papers (2023-06-06T08:50:13Z) - Laughing Matters: Introducing Laughing-Face Generation using Diffusion
Models [35.688696422879175]
We propose a novel model capable of generating realistic laughter sequences, given a still portrait and an audio clip containing laughter.
We train our model on a diverse set of laughter datasets and introduce an evaluation metric specifically designed for laughter.
Our model achieves state-of-the-art performance across all metrics, even when these are re-trained for laughter generation.
arXiv Detail & Related papers (2023-05-15T17:59:57Z) - Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head
Synthesis [90.43371339871105]
We propose Dynamic Facial Radiance Fields (DFRF) for few-shot talking head synthesis.
DFRF conditions face radiance field on 2D appearance images to learn the face prior.
Experiments show DFRF can synthesize natural and high-quality audio-driven talking head videos for novel identities with only 40k iterations.
arXiv Detail & Related papers (2022-07-24T16:46:03Z) - DialogueNeRF: Towards Realistic Avatar Face-to-Face Conversation Video
Generation [54.84137342837465]
Face-to-face conversations account for the vast majority of daily conversations.
Most existing methods focused on single-person talking head generation.
We propose a novel unified framework based on neural radiance field (NeRF)
arXiv Detail & Related papers (2022-03-15T14:16:49Z) - LaughNet: synthesizing laughter utterances from waveform silhouettes and
a single laughter example [55.10864476206503]
We propose a model called LaughNet for synthesizing laughter by using waveform silhouettes as inputs.
The results show that LaughNet can synthesize laughter utterances with moderate quality and retain the characteristics of the training example.
arXiv Detail & Related papers (2021-10-11T00:45:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.