ReactFace: Multiple Appropriate Facial Reaction Generation in Dyadic
Interactions
- URL: http://arxiv.org/abs/2305.15748v1
- Date: Thu, 25 May 2023 05:55:53 GMT
- Title: ReactFace: Multiple Appropriate Facial Reaction Generation in Dyadic
Interactions
- Authors: Cheng Luo, Siyang Song, Weicheng Xie, Micol Spitale, Linlin Shen,
Hatice Gunes
- Abstract summary: In dyadic interaction, predicting the listener's facial reactions is challenging as different reactions may be appropriate in response to the same speaker's behaviour.
This paper presents a novel framework called ReactFace that learns an appropriate facial reaction distribution from a speaker's behaviour.
- Score: 29.882412173055172
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In dyadic interaction, predicting the listener's facial reactions is
challenging as different reactions may be appropriate in response to the same
speaker's behaviour. This paper presents a novel framework called ReactFace
that learns an appropriate facial reaction distribution from a speaker's
behaviour rather than replicating the real facial reaction of the listener.
ReactFace generates multiple different but appropriate photo-realistic human
facial reactions by (i) learning an appropriate facial reaction distribution
representing multiple appropriate facial reactions; and (ii) synchronizing the
generated facial reactions with the speaker's verbal and non-verbal behaviours
at each time stamp, resulting in realistic 2D facial reaction sequences.
Experimental results demonstrate the effectiveness of our approach in
generating multiple diverse, synchronized, and appropriate facial reactions
from each speaker's behaviour, with the quality of the generated reactions
being influenced by the speaker's speech and facial behaviours. Our code is
made publicly available at \url{https://github.com/lingjivoo/ReactFace}.
Related papers
- ReGenNet: Towards Human Action-Reaction Synthesis [87.57721371471536]
We analyze the asymmetric, dynamic, synchronous, and detailed nature of human-human interactions.
We propose the first multi-setting human action-reaction benchmark to generate human reactions conditioned on given human actions.
arXiv Detail & Related papers (2024-03-18T15:33:06Z) - Emotional Listener Portrait: Realistic Listener Motion Simulation in
Conversation [50.35367785674921]
Listener head generation centers on generating non-verbal behaviors of a listener in reference to the information delivered by a speaker.
A significant challenge when generating such responses is the non-deterministic nature of fine-grained facial expressions during a conversation.
We propose the Emotional Listener Portrait (ELP), which treats each fine-grained facial motion as a composition of several discrete motion-codewords.
Our ELP model can not only automatically generate natural and diverse responses toward a given speaker via sampling from the learned distribution but also generate controllable responses with a predetermined attitude.
arXiv Detail & Related papers (2023-09-29T18:18:32Z) - MRecGen: Multimodal Appropriate Reaction Generator [31.60823534748163]
This paper proposes the first multiple and multimodal (verbal and nonverbal) appropriate human reaction generation framework.
It can be applied to various human-computer interaction scenarios by generating appropriate virtual agent/robot behaviours.
arXiv Detail & Related papers (2023-07-05T19:07:00Z) - Reversible Graph Neural Network-based Reaction Distribution Learning for
Multiple Appropriate Facial Reactions Generation [22.579200870471475]
This paper proposes the first multiple appropriate facial reaction generation framework.
It re-formulates the one-to-many mapping facial reaction generation problem as a one-to-one mapping problem.
Experimental results demonstrate that our approach outperforms existing models in generating more appropriate, realistic, and synchronized facial reactions.
arXiv Detail & Related papers (2023-05-24T15:56:26Z) - Audio-Driven Talking Face Generation with Diverse yet Realistic Facial
Animations [61.65012981435094]
DIRFA is a novel method that can generate talking faces with diverse yet realistic facial animations from the same driving audio.
To accommodate fair variation of plausible facial animations for the same audio, we design a transformer-based probabilistic mapping network.
We show that DIRFA can generate talking faces with realistic facial animations effectively.
arXiv Detail & Related papers (2023-04-18T12:36:15Z) - Emotionally Enhanced Talking Face Generation [52.07451348895041]
We build a talking face generation framework conditioned on a categorical emotion to generate videos with appropriate expressions.
We show that our model can adapt to arbitrary identities, emotions, and languages.
Our proposed framework is equipped with a user-friendly web interface with a real-time experience for talking face generation with emotions.
arXiv Detail & Related papers (2023-03-21T02:33:27Z) - Multiple Appropriate Facial Reaction Generation in Dyadic Interaction
Settings: What, Why and How? [11.130984858239412]
This paper defines the Multiple Appropriate Reaction Generation task for the first time in the literature.
It then proposes a new set of objective evaluation metrics to evaluate the appropriateness of the generated reactions.
The paper subsequently introduces a framework to predict, generate, and evaluate multiple appropriate facial reactions.
arXiv Detail & Related papers (2023-02-13T16:49:27Z) - Imitator: Personalized Speech-driven 3D Facial Animation [63.57811510502906]
State-of-the-art methods deform the face topology of the target actor to sync the input audio without considering the identity-specific speaking style and facial idiosyncrasies of the target actor.
We present Imitator, a speech-driven facial expression synthesis method, which learns identity-specific details from a short input video.
We show that our approach produces temporally coherent facial expressions from input audio while preserving the speaking style of the target actors.
arXiv Detail & Related papers (2022-12-30T19:00:02Z) - Responsive Listening Head Generation: A Benchmark Dataset and Baseline [58.168958284290156]
We define the responsive listening head generation task as the synthesis of a non-verbal head with motions and expressions reacting to the multiple inputs.
Unlike speech-driven gesture or talking head generation, we introduce more modals in this task, hoping to benefit several research fields.
arXiv Detail & Related papers (2021-12-27T07:18:50Z) - Mapping the Space of Chemical Reactions Using Attention-Based Neural
Networks [0.3848364262836075]
This work shows that transformer-based models can infer reaction classes from non-annotated, simple text-based representations of chemical reactions.
Our best model reaches a classification accuracy of 98.2%.
The insights into chemical reaction space enabled by our learned fingerprints are illustrated by an interactive reaction atlas.
arXiv Detail & Related papers (2020-12-09T10:25:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.