Bridging Modalities: Knowledge Distillation and Masked Training for
Translating Multi-Modal Emotion Recognition to Uni-Modal, Speech-Only Emotion
Recognition
- URL: http://arxiv.org/abs/2401.03000v1
- Date: Thu, 4 Jan 2024 22:42:14 GMT
- Title: Bridging Modalities: Knowledge Distillation and Masked Training for
Translating Multi-Modal Emotion Recognition to Uni-Modal, Speech-Only Emotion
Recognition
- Authors: Muhammad Muaz and Nathan Paull and Jahnavi Malagavalli
- Abstract summary: This paper presents an innovative approach to address the challenges of translating multi-modal emotion recognition models to a more practical uni-modal counterpart.
Recognizing emotions from speech signals is a critical task with applications in human-computer interaction, affective computing, and mental health assessment.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents an innovative approach to address the challenges of
translating multi-modal emotion recognition models to a more practical and
resource-efficient uni-modal counterpart, specifically focusing on speech-only
emotion recognition. Recognizing emotions from speech signals is a critical
task with applications in human-computer interaction, affective computing, and
mental health assessment. However, existing state-of-the-art models often rely
on multi-modal inputs, incorporating information from multiple sources such as
facial expressions and gestures, which may not be readily available or feasible
in real-world scenarios. To tackle this issue, we propose a novel framework
that leverages knowledge distillation and masked training techniques.
Related papers
- In-Depth Analysis of Emotion Recognition through Knowledge-Based Large Language Models [3.8153944233011385]
This paper contributes to the emerging field of context-based emotion recognition.
We propose an approach that combines emotion recognition methods with Bayesian Cue Integration.
We test this approach in the context of interpreting facial expressions during a social task, the prisoner's dilemma.
arXiv Detail & Related papers (2024-07-17T06:39:51Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Multi-Cue Adaptive Emotion Recognition Network [4.570705738465714]
We propose a new deep learning approach for emotion recognition based on adaptive multi-cues.
We compare the proposed approach with the state-of-art approaches in the CAER-S dataset.
arXiv Detail & Related papers (2021-11-03T15:08:55Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - Emotion Recognition from Multiple Modalities: Fundamentals and
Methodologies [106.62835060095532]
We discuss several key aspects of multi-modal emotion recognition (MER)
We begin with a brief introduction on widely used emotion representation models and affective modalities.
We then summarize existing emotion annotation strategies and corresponding computational tasks.
Finally, we outline several real-world applications and discuss some future directions.
arXiv Detail & Related papers (2021-08-18T21:55:20Z) - Emotion-aware Chat Machine: Automatic Emotional Response Generation for
Human-like Emotional Interaction [55.47134146639492]
This article proposes a unifed end-to-end neural architecture, which is capable of simultaneously encoding the semantics and the emotions in a post.
Experiments on real-world data demonstrate that the proposed method outperforms the state-of-the-art methods in terms of both content coherence and emotion appropriateness.
arXiv Detail & Related papers (2021-06-06T06:26:15Z) - Target Guided Emotion Aware Chat Machine [58.8346820846765]
The consistency of a response to a given post at semantic-level and emotional-level is essential for a dialogue system to deliver human-like interactions.
This article proposes a unifed end-to-end neural architecture, which is capable of simultaneously encoding the semantics and the emotions in a post.
arXiv Detail & Related papers (2020-11-15T01:55:37Z) - Facial Emotion Recognition with Noisy Multi-task Annotations [88.42023952684052]
We introduce a new problem of facial emotion recognition with noisy multi-task annotations.
For this new problem, we suggest a formulation from the point of joint distribution match view.
We exploit a new method to enable the emotion prediction and the joint distribution learning.
arXiv Detail & Related papers (2020-10-19T20:39:37Z) - Temporal aggregation of audio-visual modalities for emotion recognition [0.5352699766206808]
We propose a multimodal fusion technique for emotion recognition based on combining audio-visual modalities from a temporal window with different temporal offsets for each modality.
Our proposed method outperforms other methods from the literature and human accuracy rating.
arXiv Detail & Related papers (2020-07-08T18:44:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.