Multi-Classifier Interactive Learning for Ambiguous Speech Emotion
Recognition
- URL: http://arxiv.org/abs/2012.05429v2
- Date: Sat, 12 Dec 2020 14:59:33 GMT
- Title: Multi-Classifier Interactive Learning for Ambiguous Speech Emotion
Recognition
- Authors: Ying Zhou, Xuefeng Liang, Yu Gu, Yifei Yin, Longshan Yao
- Abstract summary: We propose a novel multi-classifier interactive learning (MCIL) method to address the ambiguous speech emotions.
MCIL mimics several individuals, who have inconsistent cognitions of ambiguous emotions, and construct new ambiguous labels.
Experiments show that MCIL does not only improve each classifier's performance, but also raises their recognition consistency from moderate to substantial.
- Score: 9.856709988128515
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In recent years, speech emotion recognition technology is of great
significance in industrial applications such as call centers, social robots and
health care. The combination of speech recognition and speech emotion
recognition can improve the feedback efficiency and the quality of service.
Thus, the speech emotion recognition has been attracted much attention in both
industry and academic. Since emotions existing in an entire utterance may have
varied probabilities, speech emotion is likely to be ambiguous, which poses
great challenges to recognition tasks. However, previous studies commonly
assigned a single-label or multi-label to each utterance in certain. Therefore,
their algorithms result in low accuracies because of the inappropriate
representation. Inspired by the optimally interacting theory, we address the
ambiguous speech emotions by proposing a novel multi-classifier interactive
learning (MCIL) method. In MCIL, multiple different classifiers first mimic
several individuals, who have inconsistent cognitions of ambiguous emotions,
and construct new ambiguous labels (the emotion probability distribution).
Then, they are retrained with the new labels to interact with their cognitions.
This procedure enables each classifier to learn better representations of
ambiguous data from others, and further improves the recognition ability. The
experiments on three benchmark corpora (MAS, IEMOCAP, and FAU-AIBO) demonstrate
that MCIL does not only improve each classifier's performance, but also raises
their recognition consistency from moderate to substantial.
Related papers
- The Emotion is Not One-hot Encoding: Learning with Grayscale Label for
Emotion Recognition in Conversation [0.0]
In emotion recognition in conversation (ERC), the emotion of the current utterance is predicted by considering the previous context.
We introduce several methods for constructing grayscale labels and confirm that each method improves the emotion recognition performance.
arXiv Detail & Related papers (2022-06-15T08:14:42Z) - Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on
Data-Driven Deep Learning [70.30713251031052]
We propose a data-driven deep learning model, i.e. StrengthNet, to improve the generalization of emotion strength assessment for seen and unseen speech.
Experiments show that the predicted emotion strength of the proposed StrengthNet is highly correlated with ground truth scores for both seen and unseen speech.
arXiv Detail & Related papers (2022-06-15T01:25:32Z) - Attention-based Region of Interest (ROI) Detection for Speech Emotion
Recognition [4.610756199751138]
We propose to use attention mechanism in deep recurrentneural networks to detection the Regions-of-Interest (ROI) thatare more emotionally salient in human emotional speech/video.
We comparethe performance of the proposed attention networks with thestate-of-the-art LSTM models on multi-class classification task ofrecognizing six basic human emotions.
arXiv Detail & Related papers (2022-03-03T22:01:48Z) - Emotion Intensity and its Control for Emotional Voice Conversion [77.05097999561298]
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity.
In this paper, we aim to explicitly characterize and control the intensity of emotion.
We propose to disentangle the speaker style from linguistic content and encode the speaker style into a style embedding in a continuous space that forms the prototype of emotion embedding.
arXiv Detail & Related papers (2022-01-10T02:11:25Z) - Multimodal Emotion Recognition with High-level Speech and Text Features [8.141157362639182]
We propose a novel cross-representation speech model to perform emotion recognition on wav2vec 2.0 speech features.
We also train a CNN-based model to recognize emotions from text features extracted with Transformer-based models.
Our method is evaluated on the IEMOCAP dataset in a 4-class classification problem.
arXiv Detail & Related papers (2021-09-29T07:08:40Z) - Emotion Recognition from Multiple Modalities: Fundamentals and
Methodologies [106.62835060095532]
We discuss several key aspects of multi-modal emotion recognition (MER)
We begin with a brief introduction on widely used emotion representation models and affective modalities.
We then summarize existing emotion annotation strategies and corresponding computational tasks.
Finally, we outline several real-world applications and discuss some future directions.
arXiv Detail & Related papers (2021-08-18T21:55:20Z) - A Circular-Structured Representation for Visual Emotion Distribution
Learning [82.89776298753661]
We propose a well-grounded circular-structured representation to utilize the prior knowledge for visual emotion distribution learning.
To be specific, we first construct an Emotion Circle to unify any emotional state within it.
On the proposed Emotion Circle, each emotion distribution is represented with an emotion vector, which is defined with three attributes.
arXiv Detail & Related papers (2021-06-23T14:53:27Z) - SpanEmo: Casting Multi-label Emotion Classification as Span-prediction [15.41237087996244]
We propose a new model "SpanEmo" casting multi-label emotion classification as span-prediction.
We introduce a loss function focused on modelling multiple co-existing emotions in the input sentence.
Experiments performed on the SemEval2018 multi-label emotion data over three language sets demonstrate our method's effectiveness.
arXiv Detail & Related papers (2021-01-25T12:11:04Z) - Facial Emotion Recognition with Noisy Multi-task Annotations [88.42023952684052]
We introduce a new problem of facial emotion recognition with noisy multi-task annotations.
For this new problem, we suggest a formulation from the point of joint distribution match view.
We exploit a new method to enable the emotion prediction and the joint distribution learning.
arXiv Detail & Related papers (2020-10-19T20:39:37Z) - COSMIC: COmmonSense knowledge for eMotion Identification in
Conversations [95.71018134363976]
We propose COSMIC, a new framework that incorporates different elements of commonsense such as mental states, events, and causal relations.
We show that COSMIC achieves new state-of-the-art results for emotion recognition on four different benchmark conversational datasets.
arXiv Detail & Related papers (2020-10-06T15:09:38Z) - x-vectors meet emotions: A study on dependencies between emotion and
speaker recognition [38.181055783134006]
We show that knowledge learned for speaker recognition can be reused for emotion recognition through transfer learning.
For emotion recognition, we show that using a simple linear model is enough to obtain good performance on the features extracted from pre-trained models.
We present results on the effect of emotion on speaker verification.
arXiv Detail & Related papers (2020-02-12T15:13:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.