Supervised Prototypical Contrastive Learning for Emotion Recognition in
Conversation
- URL: http://arxiv.org/abs/2210.08713v2
- Date: Wed, 19 Oct 2022 08:52:55 GMT
- Title: Supervised Prototypical Contrastive Learning for Emotion Recognition in
Conversation
- Authors: Xiaohui Song, Longtao Huang, Hui Xue, Songlin Hu
- Abstract summary: We propose a Supervised Prototypical Contrastive Learning (SPCL) loss for the emotion recognition task.
We design a difficulty measure function based on the distance between classes and introduce curriculum learning to alleviate the impact of extreme samples.
We achieve state-of-the-art results on three widely used benchmarks.
- Score: 25.108385802645163
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Capturing emotions within a conversation plays an essential role in modern
dialogue systems. However, the weak correlation between emotions and semantics
brings many challenges to emotion recognition in conversation (ERC). Even
semantically similar utterances, the emotion may vary drastically depending on
contexts or speakers. In this paper, we propose a Supervised Prototypical
Contrastive Learning (SPCL) loss for the ERC task. Leveraging the Prototypical
Network, the SPCL targets at solving the imbalanced classification problem
through contrastive learning and does not require a large batch size.
Meanwhile, we design a difficulty measure function based on the distance
between classes and introduce curriculum learning to alleviate the impact of
extreme samples. We achieve state-of-the-art results on three widely used
benchmarks. Further, we conduct analytical experiments to demonstrate the
effectiveness of our proposed SPCL and curriculum learning strategy. We release
the code at https://github.com/caskcsg/SPCL.
Related papers
- Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer [78.35816158511523]
We present a single-stage emotion recognition approach, employing a Decoupled Subject-Context Transformer (DSCT) for simultaneous subject localization and emotion classification.
We evaluate our single-stage framework on two widely used context-aware emotion recognition datasets, CAER-S and EMOTIC.
arXiv Detail & Related papers (2024-04-26T07:30:32Z) - Emotion-Anchored Contrastive Learning Framework for Emotion Recognition in Conversation [23.309174697717374]
Emotion Recognition in Conversation (ERC) involves detecting the underlying emotion behind each utterance within a conversation.
We propose an Emotion-Anchored Contrastive Learning framework that can generate more distinguishable utterance representations for similar emotions.
Our proposed EACL achieves state-of-the-art emotion recognition performance and exhibits superior performance on similar emotions.
arXiv Detail & Related papers (2024-03-29T17:00:55Z) - Emotion Rendering for Conversational Speech Synthesis with Heterogeneous
Graph-Based Context Modeling [50.99252242917458]
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.
To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity.
Our model outperforms the baseline models in understanding and rendering emotions.
arXiv Detail & Related papers (2023-12-19T08:47:50Z) - ERNetCL: A novel emotion recognition network in textual conversation
based on curriculum learning strategy [37.41082775317849]
We propose a novel emotion recognition network based on curriculum learning strategy (ERNetCL)
The proposed ERNetCL primarily consists of temporal encoder (TE), spatial encoder (SE), and curriculum learning (CL) loss.
Our proposed method is effective and dramatically beats other baseline models.
arXiv Detail & Related papers (2023-08-12T03:05:44Z) - A Hierarchical Regression Chain Framework for Affective Vocal Burst
Recognition [72.36055502078193]
We propose a hierarchical framework, based on chain regression models, for affective recognition from vocal bursts.
To address the challenge of data sparsity, we also use self-supervised learning (SSL) representations with layer-wise and temporal aggregation modules.
The proposed systems participated in the ACII Affective Vocal Burst (A-VB) Challenge 2022 and ranked first in the "TWO'' and "CULTURE" tasks.
arXiv Detail & Related papers (2023-03-14T16:08:45Z) - Cluster-Level Contrastive Learning for Emotion Recognition in
Conversations [13.570186295041644]
Key challenge for Emotion Recognition in Conversations (ERC) is to distinguish semantically similar emotions.
Some works utilise Supervised Contrastive Learning (SCL) which uses categorical emotion labels as supervision signals and contrasts in high-dimensional semantic space.
We propose a novel low-dimensional Supervised Cluster-level Contrastive Learning ( SCCL) method, which first reduces the high-dimensional SCL space to a three-dimensional affect representation space.
arXiv Detail & Related papers (2023-02-07T14:49:20Z) - Multimodal Emotion Recognition with Modality-Pairwise Unsupervised
Contrastive Loss [80.79641247882012]
We focus on unsupervised feature learning for Multimodal Emotion Recognition (MER)
We consider discrete emotions, and as modalities text, audio and vision are used.
Our method, as being based on contrastive loss between pairwise modalities, is the first attempt in MER literature.
arXiv Detail & Related papers (2022-07-23T10:11:24Z) - Hybrid Curriculum Learning for Emotion Recognition in Conversation [10.912215835115063]
Our framework consists of two curricula: (1) conversation-level curriculum (CC); and (2) utterance-level curriculum (UC)
With the proposed model-agnostic hybrid curriculum learning strategy, we observe significant performance boosts over a wide range of existing ERC models.
arXiv Detail & Related papers (2021-12-22T08:02:58Z) - Reinforcement Learning for Emotional Text-to-Speech Synthesis with
Improved Emotion Discriminability [82.39099867188547]
Emotional text-to-speech synthesis (ETTS) has seen much progress in recent years.
We propose a new interactive training paradigm for ETTS, denoted as i-ETTS.
We formulate an iterative training strategy with reinforcement learning to ensure the quality of i-ETTS optimization.
arXiv Detail & Related papers (2021-04-03T13:52:47Z) - SpanEmo: Casting Multi-label Emotion Classification as Span-prediction [15.41237087996244]
We propose a new model "SpanEmo" casting multi-label emotion classification as span-prediction.
We introduce a loss function focused on modelling multiple co-existing emotions in the input sentence.
Experiments performed on the SemEval2018 multi-label emotion data over three language sets demonstrate our method's effectiveness.
arXiv Detail & Related papers (2021-01-25T12:11:04Z) - COSMIC: COmmonSense knowledge for eMotion Identification in
Conversations [95.71018134363976]
We propose COSMIC, a new framework that incorporates different elements of commonsense such as mental states, events, and causal relations.
We show that COSMIC achieves new state-of-the-art results for emotion recognition on four different benchmark conversational datasets.
arXiv Detail & Related papers (2020-10-06T15:09:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.