Cluster-Level Contrastive Learning for Emotion Recognition in
Conversations
- URL: http://arxiv.org/abs/2302.03508v1
- Date: Tue, 7 Feb 2023 14:49:20 GMT
- Title: Cluster-Level Contrastive Learning for Emotion Recognition in
Conversations
- Authors: Kailai Yang, Tianlin Zhang, Hassan Alhuzali, Sophia Ananiadou
- Abstract summary: Key challenge for Emotion Recognition in Conversations (ERC) is to distinguish semantically similar emotions.
Some works utilise Supervised Contrastive Learning (SCL) which uses categorical emotion labels as supervision signals and contrasts in high-dimensional semantic space.
We propose a novel low-dimensional Supervised Cluster-level Contrastive Learning ( SCCL) method, which first reduces the high-dimensional SCL space to a three-dimensional affect representation space.
- Score: 13.570186295041644
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: A key challenge for Emotion Recognition in Conversations (ERC) is to
distinguish semantically similar emotions. Some works utilise Supervised
Contrastive Learning (SCL) which uses categorical emotion labels as supervision
signals and contrasts in high-dimensional semantic space. However, categorical
labels fail to provide quantitative information between emotions. ERC is also
not equally dependent on all embedded features in the semantic space, which
makes the high-dimensional SCL inefficient. To address these issues, we propose
a novel low-dimensional Supervised Cluster-level Contrastive Learning (SCCL)
method, which first reduces the high-dimensional SCL space to a
three-dimensional affect representation space Valence-Arousal-Dominance (VAD),
then performs cluster-level contrastive learning to incorporate measurable
emotion prototypes. To help modelling the dialogue and enriching the context,
we leverage the pre-trained knowledge adapters to infuse linguistic and factual
knowledge. Experiments show that our method achieves new state-of-the-art
results with 69.81% on IEMOCAP, 65.7% on MELD, and 62.51% on DailyDialog
datasets. The analysis also proves that the VAD space is not only suitable for
ERC but also interpretable, with VAD prototypes enhancing its performance and
stabilising the training of SCCL. In addition, the pre-trained knowledge
adapters benefit the performance of the utterance encoder and SCCL. Our code is
available at: https://github.com/SteveKGYang/SCCL
Related papers
- Text-Video Retrieval with Global-Local Semantic Consistent Learning [122.15339128463715]
We propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL)
GLSCL capitalizes on latent shared semantics across modalities for text-video retrieval.
Our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost.
arXiv Detail & Related papers (2024-05-21T11:59:36Z) - Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer [78.35816158511523]
We present a single-stage emotion recognition approach, employing a Decoupled Subject-Context Transformer (DSCT) for simultaneous subject localization and emotion classification.
We evaluate our single-stage framework on two widely used context-aware emotion recognition datasets, CAER-S and EMOTIC.
arXiv Detail & Related papers (2024-04-26T07:30:32Z) - Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation [114.72734384299476]
We propose a Language-Driven Visual Consensus (LDVC) approach, fostering improved alignment of semantic and visual information.
We leverage class embeddings as anchors due to their discrete and abstract nature, steering vision features toward class embeddings.
Our approach significantly boosts the capacity of segmentation models for unseen classes.
arXiv Detail & Related papers (2024-03-13T11:23:55Z) - SSLCL: An Efficient Model-Agnostic Supervised Contrastive Learning
Framework for Emotion Recognition in Conversations [20.856739541819056]
Emotion recognition in conversations (ERC) is a rapidly evolving task within the natural language processing community.
We propose an efficient and model-agnostic SCL framework named Supervised Sample-Label Contrastive Learning with Soft-HGR Maximal Correlation (SSLCL)
We introduce a novel perspective on utilizing label representations by projecting discrete labels into dense embeddings through a shallow multilayer perceptron.
arXiv Detail & Related papers (2023-10-25T14:41:14Z) - ERNetCL: A novel emotion recognition network in textual conversation
based on curriculum learning strategy [37.41082775317849]
We propose a novel emotion recognition network based on curriculum learning strategy (ERNetCL)
The proposed ERNetCL primarily consists of temporal encoder (TE), spatial encoder (SE), and curriculum learning (CL) loss.
Our proposed method is effective and dramatically beats other baseline models.
arXiv Detail & Related papers (2023-08-12T03:05:44Z) - Supervised Adversarial Contrastive Learning for Emotion Recognition in
Conversations [24.542445315345464]
We propose a framework for learning class-spread structured representations in a supervised manner.
It can effectively utilize label-level feature consistency and retain fine-grained intra-class features.
Under the framework with CAT, we develop a sequence-based SACL-LSTM to learn label-consistent and context-robust features.
arXiv Detail & Related papers (2023-06-02T12:52:38Z) - TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation [53.974228542090046]
Contrastive Language-Image Pre-training (CLIP) has recently shown great promise in pixel-level zero-shot learning tasks.
Existing approaches utilizing CLIP's text and patch embeddings to generate semantic masks often misidentify input pixels from unseen classes.
We propose TagCLIP (Trusty-aware guided CLIP) to address this issue.
arXiv Detail & Related papers (2023-04-15T12:52:23Z) - Supervised Prototypical Contrastive Learning for Emotion Recognition in
Conversation [25.108385802645163]
We propose a Supervised Prototypical Contrastive Learning (SPCL) loss for the emotion recognition task.
We design a difficulty measure function based on the distance between classes and introduce curriculum learning to alleviate the impact of extreme samples.
We achieve state-of-the-art results on three widely used benchmarks.
arXiv Detail & Related papers (2022-10-17T03:08:23Z) - Integrating Language Guidance into Vision-based Deep Metric Learning [78.18860829585182]
We propose to learn metric spaces which encode semantic similarities as embedding space.
These spaces should be transferable to classes beyond those seen during training.
This causes learned embedding spaces to encode incomplete semantic context and misrepresent the semantic relation between classes.
arXiv Detail & Related papers (2022-03-16T11:06:50Z) - An Attribute-Aligned Strategy for Learning Speech Representation [57.891727280493015]
We propose an attribute-aligned learning strategy to derive speech representation that can flexibly address these issues by attribute-selection mechanism.
Specifically, we propose a layered-representation variational autoencoder (LR-VAE), which factorizes speech representation into attribute-sensitive nodes.
Our proposed method achieves competitive performances on identity-free SER and a better performance on emotionless SV.
arXiv Detail & Related papers (2021-06-05T06:19:14Z) - Contrastive Unsupervised Learning for Speech Emotion Recognition [22.004507213531102]
Speech emotion recognition (SER) is a key technology to enable more natural human-machine communication.
We show that the contrastive predictive coding (CPC) method can learn salient representations from unlabeled datasets.
arXiv Detail & Related papers (2021-02-12T06:06:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.