Reversible Graph Neural Network-based Reaction Distribution Learning for
Multiple Appropriate Facial Reactions Generation
- URL: http://arxiv.org/abs/2305.15270v3
- Date: Thu, 16 Nov 2023 16:45:45 GMT
- Title: Reversible Graph Neural Network-based Reaction Distribution Learning for
Multiple Appropriate Facial Reactions Generation
- Authors: Tong Xu, Micol Spitale, Hao Tang, Lu Liu, Hatice Gunes, Siyang Song
- Abstract summary: This paper proposes the first multiple appropriate facial reaction generation framework.
It re-formulates the one-to-many mapping facial reaction generation problem as a one-to-one mapping problem.
Experimental results demonstrate that our approach outperforms existing models in generating more appropriate, realistic, and synchronized facial reactions.
- Score: 22.579200870471475
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Generating facial reactions in a human-human dyadic interaction is complex
and highly dependent on the context since more than one facial reactions can be
appropriate for the speaker's behaviour. This has challenged existing machine
learning (ML) methods, whose training strategies enforce models to reproduce a
specific (not multiple) facial reaction from each input speaker behaviour. This
paper proposes the first multiple appropriate facial reaction generation
framework that re-formulates the one-to-many mapping facial reaction generation
problem as a one-to-one mapping problem. This means that we approach this
problem by considering the generation of a distribution of the listener's
appropriate facial reactions instead of multiple different appropriate facial
reactions, i.e., 'many' appropriate facial reaction labels are summarised as
'one' distribution label during training. Our model consists of a perceptual
processor, a cognitive processor, and a motor processor. The motor processor is
implemented with a novel Reversible Multi-dimensional Edge Graph Neural Network
(REGNN). This allows us to obtain a distribution of appropriate real facial
reactions during the training process, enabling the cognitive processor to be
trained to predict the appropriate facial reaction distribution. At the
inference stage, the REGNN decodes an appropriate facial reaction by using this
distribution as input. Experimental results demonstrate that our approach
outperforms existing models in generating more appropriate, realistic, and
synchronized facial reactions. The improved performance is largely attributed
to the proposed appropriate facial reaction distribution learning strategy and
the use of a REGNN. The code is available at
https://github.com/TongXu-05/REGNN-Multiple-Appropriate-Facial-Reaction-Generation.
Related papers
- REACT 2024: the Second Multiple Appropriate Facial Reaction Generation
Challenge [36.84914349494818]
In dyadic interactions, humans communicate their intentions and state of mind using verbal and non-verbal cues.
How to develop a machine learning (ML) model that can automatically generate multiple appropriate, diverse, realistic and synchronised human facial reactions is a challenging task.
This paper presents the guidelines of the REACT 2024 challenge and the dataset utilized in the challenge.
arXiv Detail & Related papers (2024-01-10T14:01:51Z) - MRecGen: Multimodal Appropriate Reaction Generator [31.60823534748163]
This paper proposes the first multiple and multimodal (verbal and nonverbal) appropriate human reaction generation framework.
It can be applied to various human-computer interaction scenarios by generating appropriate virtual agent/robot behaviours.
arXiv Detail & Related papers (2023-07-05T19:07:00Z) - ReactFace: Online Multiple Appropriate Facial Reaction Generation in Dyadic Interactions [46.66378299720377]
In dyadic interaction, predicting the listener's facial reactions is challenging as different reactions could be appropriate in response to the same speaker's behaviour.
This paper reformulates the task as an extrapolation or prediction problem, and proposes a novel framework (called ReactFace) to generate multiple different but appropriate facial reactions.
arXiv Detail & Related papers (2023-05-25T05:55:53Z) - Multiple Appropriate Facial Reaction Generation in Dyadic Interaction
Settings: What, Why and How? [11.130984858239412]
This paper defines the Multiple Appropriate Reaction Generation task for the first time in the literature.
It then proposes a new set of objective evaluation metrics to evaluate the appropriateness of the generated reactions.
The paper subsequently introduces a framework to predict, generate, and evaluate multiple appropriate facial reactions.
arXiv Detail & Related papers (2023-02-13T16:49:27Z) - Decoupled Multi-task Learning with Cyclical Self-Regulation for Face
Parsing [71.19528222206088]
We propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation for face parsing.
Specifically, DML-CSR designs a multi-task model which comprises face parsing, binary edge, and category edge detection.
Our method achieves the new state-of-the-art performance on the Helen, CelebA-HQ, and LapaMask datasets.
arXiv Detail & Related papers (2022-03-28T02:12:30Z) - TANet: A new Paradigm for Global Face Super-resolution via
Transformer-CNN Aggregation Network [72.41798177302175]
We propose a novel paradigm based on the self-attention mechanism (i.e., the core of Transformer) to fully explore the representation capacity of the facial structure feature.
Specifically, we design a Transformer-CNN aggregation network (TANet) consisting of two paths, in which one path uses CNNs responsible for restoring fine-grained facial details.
By aggregating the features from the above two paths, the consistency of global facial structure and fidelity of local facial detail restoration are strengthened simultaneously.
arXiv Detail & Related papers (2021-09-16T18:15:07Z) - Synthetic Expressions are Better Than Real for Learning to Detect Facial
Actions [4.4532095214807965]
Our approach reconstructs the 3D shape of the face from each video frame, aligns the 3D mesh to a canonical view, and then trains a GAN-based network to synthesize novel images with facial action units of interest.
The network trained on synthesized facial expressions outperformed the one trained on actual facial expressions and surpassed current state-of-the-art approaches.
arXiv Detail & Related papers (2020-10-21T13:11:45Z) - Facial Emotion Recognition with Noisy Multi-task Annotations [88.42023952684052]
We introduce a new problem of facial emotion recognition with noisy multi-task annotations.
For this new problem, we suggest a formulation from the point of joint distribution match view.
We exploit a new method to enable the emotion prediction and the joint distribution learning.
arXiv Detail & Related papers (2020-10-19T20:39:37Z) - InterFaceGAN: Interpreting the Disentangled Face Representation Learned
by GANs [73.27299786083424]
We propose a framework called InterFaceGAN to interpret the disentangled face representation learned by state-of-the-art GAN models.
We first find that GANs learn various semantics in some linear subspaces of the latent space.
We then conduct a detailed study on the correlation between different semantics and manage to better disentangle them via subspace projection.
arXiv Detail & Related papers (2020-05-18T18:01:22Z) - Learning to Augment Expressions for Few-shot Fine-grained Facial
Expression Recognition [98.83578105374535]
We present a novel Fine-grained Facial Expression Database - F2ED.
It includes more than 200k images with 54 facial expressions from 119 persons.
Considering the phenomenon of uneven data distribution and lack of samples is common in real-world scenarios, we evaluate several tasks of few-shot expression learning.
We propose a unified task-driven framework - Compositional Generative Adversarial Network (Comp-GAN) learning to synthesize facial images.
arXiv Detail & Related papers (2020-01-17T03:26:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.