Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus
Speech Emotion Recognition
- URL: http://arxiv.org/abs/2308.02190v1
- Date: Fri, 4 Aug 2023 08:15:17 GMT
- Title: Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus
Speech Emotion Recognition
- Authors: Jiaxin Ye and Yujie Wei and Xin-Cheng Wen and Chenglong Ma and
Zhizhong Huang and Kunhong Liu and Hongming Shan
- Abstract summary: Cross-corpus speech emotion recognition (SER) seeks to generalize the ability of inferring speech emotion from a well-labeled corpus to an unlabeled one.
Existing methods, typically based on unsupervised domain adaptation (UDA), struggle to learn corpus-invariant features by global distribution alignment.
We propose a novel Emotion Decoupling aNd Alignment learning framework (EMO-DNA) for cross-corpus SER.
- Score: 16.159171586384023
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-corpus speech emotion recognition (SER) seeks to generalize the ability
of inferring speech emotion from a well-labeled corpus to an unlabeled one,
which is a rather challenging task due to the significant discrepancy between
two corpora. Existing methods, typically based on unsupervised domain
adaptation (UDA), struggle to learn corpus-invariant features by global
distribution alignment, but unfortunately, the resulting features are mixed
with corpus-specific features or not class-discriminative. To tackle these
challenges, we propose a novel Emotion Decoupling aNd Alignment learning
framework (EMO-DNA) for cross-corpus SER, a novel UDA method to learn
emotion-relevant corpus-invariant features. The novelties of EMO-DNA are
two-fold: contrastive emotion decoupling and dual-level emotion alignment. On
one hand, our contrastive emotion decoupling achieves decoupling learning via a
contrastive decoupling loss to strengthen the separability of emotion-relevant
features from corpus-specific ones. On the other hand, our dual-level emotion
alignment introduces an adaptive threshold pseudo-labeling to select confident
target samples for class-level alignment, and performs corpus-level alignment
to jointly guide model for learning class-discriminative corpus-invariant
features across corpora. Extensive experimental results demonstrate the
superior performance of EMO-DNA over the state-of-the-art methods in several
cross-corpus scenarios. Source code is available at
https://github.com/Jiaxin-Ye/Emo-DNA.
Related papers
- Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer [78.35816158511523]
We present a single-stage emotion recognition approach, employing a Decoupled Subject-Context Transformer (DSCT) for simultaneous subject localization and emotion classification.
We evaluate our single-stage framework on two widely used context-aware emotion recognition datasets, CAER-S and EMOTIC.
arXiv Detail & Related papers (2024-04-26T07:30:32Z) - Emotion-Anchored Contrastive Learning Framework for Emotion Recognition in Conversation [23.309174697717374]
Emotion Recognition in Conversation (ERC) involves detecting the underlying emotion behind each utterance within a conversation.
We propose an Emotion-Anchored Contrastive Learning framework that can generate more distinguishable utterance representations for similar emotions.
Our proposed EACL achieves state-of-the-art emotion recognition performance and exhibits superior performance on similar emotions.
arXiv Detail & Related papers (2024-03-29T17:00:55Z) - Attention-based Interactive Disentangling Network for Instance-level
Emotional Voice Conversion [81.1492897350032]
Emotional Voice Conversion aims to manipulate a speech according to a given emotion while preserving non-emotion components.
We propose an Attention-based Interactive diseNtangling Network (AINN) that leverages instance-wise emotional knowledge for voice conversion.
arXiv Detail & Related papers (2023-12-29T08:06:45Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - Deep Implicit Distribution Alignment Networks for Cross-Corpus Speech
Emotion Recognition [19.281716812246557]
We propose a novel deep transfer learning method called deep implicit distribution alignment networks (DIDAN)
DIDAN deals with cross-corpus speech emotion recognition problem, in which the labeled training (source) and unlabeled testing (target) speech signals come from different corpora.
To evaluate the proposed DIDAN, extensive cross-corpus SER experiments on widely-used speech emotion corpora are carried out.
arXiv Detail & Related papers (2023-02-17T14:51:37Z) - Seeking Subjectivity in Visual Emotion Distribution Learning [93.96205258496697]
Visual Emotion Analysis (VEA) aims to predict people's emotions towards different visual stimuli.
Existing methods often predict visual emotion distribution in a unified network, neglecting the inherent subjectivity in its crowd voting process.
We propose a novel textitSubjectivity Appraise-and-Match Network (SAMNet) to investigate the subjectivity in visual emotion distribution.
arXiv Detail & Related papers (2022-07-25T02:20:03Z) - When Facial Expression Recognition Meets Few-Shot Learning: A Joint and
Alternate Learning Framework [60.51225419301642]
We propose an Emotion Guided Similarity Network (EGS-Net) to address the diversity of human emotions in practical scenarios.
EGS-Net consists of an emotion branch and a similarity branch, based on a two-stage learning framework.
Experimental results on both in-the-lab and in-the-wild compound expression datasets demonstrate the superiority of our proposed method against several state-of-the-art methods.
arXiv Detail & Related papers (2022-01-18T07:24:12Z) - A Circular-Structured Representation for Visual Emotion Distribution
Learning [82.89776298753661]
We propose a well-grounded circular-structured representation to utilize the prior knowledge for visual emotion distribution learning.
To be specific, we first construct an Emotion Circle to unify any emotional state within it.
On the proposed Emotion Circle, each emotion distribution is represented with an emotion vector, which is defined with three attributes.
arXiv Detail & Related papers (2021-06-23T14:53:27Z) - SpanEmo: Casting Multi-label Emotion Classification as Span-prediction [15.41237087996244]
We propose a new model "SpanEmo" casting multi-label emotion classification as span-prediction.
We introduce a loss function focused on modelling multiple co-existing emotions in the input sentence.
Experiments performed on the SemEval2018 multi-label emotion data over three language sets demonstrate our method's effectiveness.
arXiv Detail & Related papers (2021-01-25T12:11:04Z) - A Generalized Zero-Shot Framework for Emotion Recognition from Body
Gestures [5.331671302839567]
We introduce a Generalized Zero-Shot Learning (GZSL) framework to infer the emotional state of new body gestures.
The framework is significantly superior to the traditional method of emotion classification and state-of-the-art zero-shot learning methods.
arXiv Detail & Related papers (2020-10-13T13:16:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.