Interactive Conversational Head Generation
- URL: http://arxiv.org/abs/2307.02090v1
- Date: Wed, 5 Jul 2023 08:06:26 GMT
- Title: Interactive Conversational Head Generation
- Authors: Mohan Zhou, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao
- Abstract summary: We introduce a new conversation head generation benchmark for synthesizing behaviors of a single interlocutor in a face-to-face conversation.
The capability to automatically synthesize interlocutors which can participate in long and multi-turn conversations is vital and offer benefits for various applications.
- Score: 68.76774230274076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a new conversation head generation benchmark for synthesizing
behaviors of a single interlocutor in a face-to-face conversation. The
capability to automatically synthesize interlocutors which can participate in
long and multi-turn conversations is vital and offer benefits for various
applications, including digital humans, virtual agents, and social robots.
While existing research primarily focuses on talking head generation (one-way
interaction), hindering the ability to create a digital human for conversation
(two-way) interaction due to the absence of listening and interaction parts. In
this work, we construct two datasets to address this issue, ``ViCo'' for
independent talking and listening head generation tasks at the sentence level,
and ``ViCo-X'', for synthesizing interlocutors in multi-turn conversational
scenarios. Based on ViCo and ViCo-X, we define three novel tasks targeting the
interaction modeling during the face-to-face conversation: 1) responsive
listening head generation making listeners respond actively to the speaker with
non-verbal signals, 2) expressive talking head generation guiding speakers to
be aware of listeners' behaviors, and 3) conversational head generation to
integrate the talking/listening ability in one interlocutor. Along with the
datasets, we also propose corresponding baseline solutions to the three
aforementioned tasks. Experimental results show that our baseline method could
generate responsive and vivid agents that can collaborate with real person to
fulfil the whole conversation. Project page: https://vico.solutions/.
Related papers
- Beyond Talking -- Generating Holistic 3D Human Dyadic Motion for Communication [17.294279444027563]
We introduce an innovative task focused on human communication, aiming to generate 3D holistic human motions for both speakers and listeners.
We consider the real-time mutual influence between the speaker and the listener and propose a novel chain-like transformer-based auto-regressive model.
Our approach demonstrates state-of-the-art performance on two benchmark datasets.
arXiv Detail & Related papers (2024-03-28T14:47:32Z) - Enhancing Personality Recognition in Dialogue by Data Augmentation and
Heterogeneous Conversational Graph Networks [30.33718960981521]
Personality recognition is useful for enhancing robots' ability to tailor user-adaptive responses.
One of the challenges in this task is a limited number of speakers in existing dialogue corpora.
arXiv Detail & Related papers (2024-01-11T12:27:33Z) - Visual-Aware Text-to-Speech [101.89332968344102]
We present a new visual-aware text-to-speech (VA-TTS) task to synthesize speech conditioned on both textual inputs and visual feedback of the listener in face-to-face communication.
We devise a baseline model to fuse phoneme linguistic information and listener visual signals for speech synthesis.
arXiv Detail & Related papers (2023-06-21T05:11:39Z) - KETOD: Knowledge-Enriched Task-Oriented Dialogue [77.59814785157877]
Existing studies in dialogue system research mostly treat task-oriented dialogue and chit-chat as separate domains.
We investigate how task-oriented dialogue and knowledge-grounded chit-chat can be effectively integrated into a single model.
arXiv Detail & Related papers (2022-05-11T16:01:03Z) - DialogueNeRF: Towards Realistic Avatar Face-to-Face Conversation Video
Generation [54.84137342837465]
Face-to-face conversations account for the vast majority of daily conversations.
Most existing methods focused on single-person talking head generation.
We propose a novel unified framework based on neural radiance field (NeRF)
arXiv Detail & Related papers (2022-03-15T14:16:49Z) - Responsive Listening Head Generation: A Benchmark Dataset and Baseline [58.168958284290156]
We define the responsive listening head generation task as the synthesis of a non-verbal head with motions and expressions reacting to the multiple inputs.
Unlike speech-driven gesture or talking head generation, we introduce more modals in this task, hoping to benefit several research fields.
arXiv Detail & Related papers (2021-12-27T07:18:50Z) - Few-shot Language Coordination by Modeling Theory of Mind [95.54446989205117]
We study the task of few-shot $textitlanguage coordination$.
We require the lead agent to coordinate with a $textitpopulation$ of agents with different linguistic abilities.
This requires the ability to model the partner's beliefs, a vital component of human communication.
arXiv Detail & Related papers (2021-07-12T19:26:11Z) - Intelligent Conversational Android ERICA Applied to Attentive Listening
and Job Interview [41.789773897391605]
We have developed an intelligent conversational android ERICA.
We set up several social interaction tasks for ERICA, including attentive listening, job interview, and speed dating.
It has been evaluated with 40 senior people, engaged in conversation of 5-7 minutes without a conversation breakdown.
arXiv Detail & Related papers (2021-05-02T06:37:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.