Related papers: The ACII 2022 Affective Vocal Bursts Workshop & Competition: Understanding a critically understudied modality of emotional expression

The ACII 2022 Affective Vocal Bursts Workshop & Competition: Understanding a critically understudied modality of emotional expression

URL: http://arxiv.org/abs/2207.03572v1
Date: Thu, 7 Jul 2022 21:09:35 GMT
Title: The ACII 2022 Affective Vocal Bursts Workshop & Competition: Understanding a critically understudied modality of emotional expression
Authors: Alice Baird, Panagiotis Tzirakis, Jeffrey A. Brooks, Christopher B. Gregory, Bj\"orn Schuller, Anton Batliner, Dacher Keltner, Alan Cowen
Abstract summary: This paper describes the four tracks and baseline systems, which use state-of-the-art machine learning methods. This year's competition comprises four tracks using a dataset of 59,299 vocalizations from 1,702 speakers. The baseline performance for each track is obtained by utilizing an end-to-end deep learning model.
Score: 16.364737403587235
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The ACII Affective Vocal Bursts Workshop & Competition is focused on understanding multiple affective dimensions of vocal bursts: laughs, gasps, cries, screams, and many other non-linguistic vocalizations central to the expression of emotion and to human communication more generally. This year's competition comprises four tracks using a large-scale and in-the-wild dataset of 59,299 vocalizations from 1,702 speakers. The first, the A-VB-High task, requires competition participants to perform a multi-label regression on a novel model for emotion, utilizing ten classes of richly annotated emotional expression intensities, including; Awe, Fear, and Surprise. The second, the A-VB-Two task, utilizes the more conventional 2-dimensional model for emotion, arousal, and valence. The third, the A-VB-Culture task, requires participants to explore the cultural aspects of the dataset, training native-country dependent models. Finally, for the fourth task, A-VB-Type, participants should recognize the type of vocal burst (e.g., laughter, cry, grunt) as an 8-class classification. This paper describes the four tracks and baseline systems, which use state-of-the-art machine learning methods. The baseline performance for each track is obtained by utilizing an end-to-end deep learning model and is as follows: for A-VB-High, a mean (over the 10-dimensions) Concordance Correlation Coefficient (CCC) of 0.5687 CCC is obtained; for A-VB-Two, a mean (over the 2-dimensions) CCC of 0.5084 is obtained; for A-VB-Culture, a mean CCC from the four cultures of 0.4401 is obtained; and for A-VB-Type, the baseline Unweighted Average Recall (UAR) from the 8-classes is 0.4172 UAR.

Related papers

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks [112.7791602217381]
We present Dynamic-SUPERB Phase-2, an open benchmark for the comprehensive evaluation of instruction-based universal speech models. Building upon the first generation, this second version incorporates 125 new tasks, expanding the benchmark to a total of 180 tasks. Evaluation results indicate that none of the models performed well universally.
arXiv Detail & Related papers (2024-11-08T06:33:22Z)
The 6th Affective Behavior Analysis in-the-wild (ABAW) Competition [53.718777420180395]
This paper describes the 6th Affective Behavior Analysis in-the-wild (ABAW) Competition. The 6th ABAW Competition addresses contemporary challenges in understanding human emotions and behaviors.
arXiv Detail & Related papers (2024-02-29T16:49:38Z)
Learning Speech Representation From Contrastive Token-Acoustic Pretraining [57.08426714676043]
We propose "Contrastive Token-Acoustic Pretraining (CTAP)", which uses two encoders to bring phoneme and speech into a joint multimodal space. The proposed CTAP model is trained on 210k speech and phoneme pairs, achieving minimally-supervised TTS, VC, and ASR.
arXiv Detail & Related papers (2023-09-01T12:35:43Z)
A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition [72.36055502078193]
We propose a hierarchical framework, based on chain regression models, for affective recognition from vocal bursts. To address the challenge of data sparsity, we also use self-supervised learning (SSL) representations with layer-wise and temporal aggregation modules. The proposed systems participated in the ACII Affective Vocal Burst (A-VB) Challenge 2022 and ranked first in the "TWO'' and "CULTURE" tasks.
arXiv Detail & Related papers (2023-03-14T16:08:45Z)
ABAW: Valence-Arousal Estimation, Expression Recognition, Action Unit Detection & Emotional Reaction Intensity Estimation Challenges [62.413819189049946]
5th Affective Behavior Analysis in-the-wild (ABAW) Competition is part of the respective ABAW Workshop which will be held in conjunction with IEEE Computer Vision and Pattern Recognition Conference (CVPR), 2023. For this year's Competition, we feature two corpora: i) an extended version of the Aff-Wild2 database and ii) the Hume-Reaction dataset. The latter dataset is an audiovisual one in which reactions of individuals to emotional stimuli have been annotated with respect to seven emotional expression intensities.
arXiv Detail & Related papers (2023-03-02T18:58:15Z)
Self-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal Burst [4.6193503399184275]
The technical report presents our emotion recognition pipeline for high-dimensional emotion task (A-VB High) in The ACII Affective Vocal Bursts (A-VB) 2022 Workshop & Competition. By empirical experiments, our proposed method achieves a mean correlation coefficient (CCC) of 0.7295 on the test set, compared to 0.5686 on the baseline model.
arXiv Detail & Related papers (2022-09-15T22:06:42Z)
Proceedings of the ICML 2022 Expressive Vocalizations Workshop and Competition: Recognizing, Generating, and Personalizing Vocal Bursts [28.585851793516873]
ExVo 2022, included three competition tracks using a large-scale dataset of 59,201 vocalizations from 1,702 speakers. The first, ExVo-MultiTask, requires participants to train a multi-task model to recognize expressed emotions and demographic traits from vocal bursts. The second, ExVo-Generate, requires participants to train a generative model that produces vocal bursts conveying ten different emotions.
arXiv Detail & Related papers (2022-07-14T14:30:34Z)
Burst2Vec: An Adversarial Multi-Task Approach for Predicting Emotion, Age, and Origin from Vocal Bursts [49.31604138034298]
Burst2Vec uses pre-trained speech representations to capture acoustic information from raw waveforms. Our models achieve a relative 30 % performance gain over baselines using pre-extracted features.
arXiv Detail & Related papers (2022-06-24T18:57:41Z)
The ICML 2022 Expressive Vocalizations Workshop and Competition: Recognizing, Generating, and Personalizing Vocal Bursts [28.585851793516873]
ExVo 2022 includes three competition tracks using a large-scale dataset of 59,201 vocalizations from 1,702 speakers. This paper describes the three tracks and provides performance measures for baseline models using state-of-the-art machine learning strategies.
arXiv Detail & Related papers (2022-05-03T21:06:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.