A Hierarchical Regression Chain Framework for Affective Vocal Burst
Recognition
- URL: http://arxiv.org/abs/2303.08027v1
- Date: Tue, 14 Mar 2023 16:08:45 GMT
- Title: A Hierarchical Regression Chain Framework for Affective Vocal Burst
Recognition
- Authors: Jinchao Li, Xixin Wu, Kaitao Song, Dongsheng Li, Xunying Liu, Helen
Meng
- Abstract summary: We propose a hierarchical framework, based on chain regression models, for affective recognition from vocal bursts.
To address the challenge of data sparsity, we also use self-supervised learning (SSL) representations with layer-wise and temporal aggregation modules.
The proposed systems participated in the ACII Affective Vocal Burst (A-VB) Challenge 2022 and ranked first in the "TWO'' and "CULTURE" tasks.
- Score: 72.36055502078193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a common way of emotion signaling via non-linguistic vocalizations, vocal
burst (VB) plays an important role in daily social interaction. Understanding
and modeling human vocal bursts are indispensable for developing robust and
general artificial intelligence. Exploring computational approaches for
understanding vocal bursts is attracting increasing research attention. In this
work, we propose a hierarchical framework, based on chain regression models,
for affective recognition from VBs, that explicitly considers multiple
relationships: (i) between emotional states and diverse cultures; (ii) between
low-dimensional (arousal & valence) and high-dimensional (10 emotion classes)
emotion spaces; and (iii) between various emotion classes within the
high-dimensional space. To address the challenge of data sparsity, we also use
self-supervised learning (SSL) representations with layer-wise and temporal
aggregation modules. The proposed systems participated in the ACII Affective
Vocal Burst (A-VB) Challenge 2022 and ranked first in the "TWO'' and "CULTURE''
tasks. Experimental results based on the ACII Challenge 2022 dataset
demonstrate the superior performance of the proposed system and the
effectiveness of considering multiple relationships using hierarchical
regression chain models.
Related papers
- A Layer-Anchoring Strategy for Enhancing Cross-Lingual Speech Emotion Recognition [41.05066959632938]
Cross-lingual speech emotion recognition (SER) is important for a wide range of everyday applications.
We propose a novel strategy called a layer-anchoring mechanism to facilitate emotion transfer in SER tasks.
Our approach is evaluated using two distinct language affective corpora.
arXiv Detail & Related papers (2024-07-06T05:56:55Z) - Self-supervised Gait-based Emotion Representation Learning from Selective Strongly Augmented Skeleton Sequences [4.740624855896404]
We propose a contrastive learning framework utilizing selective strong augmentation for self-supervised gait-based emotion representation.
Our approach is validated on the Emotion-Gait (E-Gait) and Emilya datasets and outperforms the state-of-the-art methods under different evaluation protocols.
arXiv Detail & Related papers (2024-05-08T09:13:10Z) - Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer [78.35816158511523]
We present a single-stage emotion recognition approach, employing a Decoupled Subject-Context Transformer (DSCT) for simultaneous subject localization and emotion classification.
We evaluate our single-stage framework on two widely used context-aware emotion recognition datasets, CAER-S and EMOTIC.
arXiv Detail & Related papers (2024-04-26T07:30:32Z) - Deep Imbalanced Learning for Multimodal Emotion Recognition in
Conversations [15.705757672984662]
Multimodal Emotion Recognition in Conversations (MERC) is a significant development direction for machine intelligence.
Many data in MERC naturally exhibit an imbalanced distribution of emotion categories, and researchers ignore the negative impact of imbalanced data on emotion recognition.
We propose the Class Boundary Enhanced Representation Learning (CBERL) model to address the imbalanced distribution of emotion categories in raw data.
We have conducted extensive experiments on the IEMOCAP and MELD benchmark datasets, and the results show that CBERL has achieved a certain performance improvement in the effectiveness of emotion recognition.
arXiv Detail & Related papers (2023-12-11T12:35:17Z) - EmotionIC: emotional inertia and contagion-driven dependency modeling for emotion recognition in conversation [34.24557248359872]
We propose an emotional inertia and contagion-driven dependency modeling approach (EmotionIC) for ERC task.
Our EmotionIC consists of three main components, i.e., Identity Masked Multi-Head Attention (IMMHA), Dialogue-based Gated Recurrent Unit (DiaGRU) and Skip-chain Conditional Random Field (SkipCRF)
Experimental results show that our method can significantly outperform the state-of-the-art models on four benchmark datasets.
arXiv Detail & Related papers (2023-03-20T13:58:35Z) - M2R2: Missing-Modality Robust emotion Recognition framework with
iterative data augmentation [6.962213869946514]
We propose Missing-Modality Robust emotion Recognition (M2R2), which trains emotion recognition model with iterative data augmentation by learned common representation.
Party Attentive Network (PANet) is designed to classify emotions, which tracks all the speakers' states and context.
arXiv Detail & Related papers (2022-05-05T09:16:31Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Symbiotic Adversarial Learning for Attribute-based Person Search [86.7506832053208]
We present a symbiotic adversarial learning framework, called SAL.Two GANs sit at the base of the framework in a symbiotic learning scheme.
Specifically, two different types of generative adversarial networks learn collaboratively throughout the training process.
arXiv Detail & Related papers (2020-07-19T07:24:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.