Automatic Analysis of the Emotional Content of Speech in Daylong
Child-Centered Recordings from a Neonatal Intensive Care Unit
- URL: http://arxiv.org/abs/2106.09539v1
- Date: Mon, 14 Jun 2021 11:17:52 GMT
- Title: Automatic Analysis of the Emotional Content of Speech in Daylong
Child-Centered Recordings from a Neonatal Intensive Care Unit
- Authors: Einari Vaaras, Sari Ahlqvist-Bj\"orkroth, Konstantinos Drossos, Okko
R\"as\"anen
- Abstract summary: Hundreds of hours of daylong recordings from preterm infants' audio environments were collected from two hospitals in Finland and Estonia.
We introduce this initially unannotated large-scale real-world audio dataset and describe the development of a functional SER system for the Finnish subset of the data.
We show that the best-performing models are able to achieve a classification performance of 73.4% unweighted average recall.
- Score: 3.7373314439051106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Researchers have recently started to study how the emotional speech heard by
young infants can affect their developmental outcomes. As a part of this
research, hundreds of hours of daylong recordings from preterm infants' audio
environments were collected from two hospitals in Finland and Estonia in the
context of so-called APPLE study. In order to analyze the emotional content of
speech in such a massive dataset, an automatic speech emotion recognition (SER)
system is required. However, there are no emotion labels or existing indomain
SER systems to be used for this purpose. In this paper, we introduce this
initially unannotated large-scale real-world audio dataset and describe the
development of a functional SER system for the Finnish subset of the data. We
explore the effectiveness of alternative state-of-the-art techniques to deploy
a SER system to a new domain, comparing cross-corpus generalization, WGAN-based
domain adaptation, and active learning in the task. As a result, we show that
the best-performing models are able to achieve a classification performance of
73.4% unweighted average recall (UAR) and 73.2% UAR for a binary classification
for valence and arousal, respectively. The results also show that active
learning achieves the most consistent performance compared to the two
alternatives.
Related papers
- Analysing the Impact of Audio Quality on the Use of Naturalistic
Long-Form Recordings for Infant-Directed Speech Research [62.997667081978825]
Modelling of early language acquisition aims to understand how infants bootstrap their language skills.
Recent developments have enabled the use of more naturalistic training data for computational models.
It is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data.
arXiv Detail & Related papers (2023-05-03T08:25:37Z) - A Comparative Study of Pre-trained Speech and Audio Embeddings for
Speech Emotion Recognition [0.0]
Speech Emotion Recognition (SER) has a wide range of applications, including dynamic analysis of customer calls, mental health assessment, and personalized language learning.
Pre-trained models (PTMs) have shown great promise in the speech and audio domain. Embeddings leveraged from these models serve as inputs for learning algorithms with applications in various downstream tasks.
We perform an extensive empirical analysis with four speech emotion datasets (CREMA-D, TESS, SAVEE, Emo-DB) by training three algorithms on the derived embeddings.
The results of our study indicate that the best performance is achieved by algorithms trained on embeddings
arXiv Detail & Related papers (2023-04-22T19:56:35Z) - A Hierarchical Regression Chain Framework for Affective Vocal Burst
Recognition [72.36055502078193]
We propose a hierarchical framework, based on chain regression models, for affective recognition from vocal bursts.
To address the challenge of data sparsity, we also use self-supervised learning (SSL) representations with layer-wise and temporal aggregation modules.
The proposed systems participated in the ACII Affective Vocal Burst (A-VB) Challenge 2022 and ranked first in the "TWO'' and "CULTURE" tasks.
arXiv Detail & Related papers (2023-03-14T16:08:45Z) - Feature Selection Enhancement and Feature Space Visualization for
Speech-Based Emotion Recognition [2.223733768286313]
We present speech features enhancement strategy that improves speech emotion recognition.
The strategy is compared with the state-of-the-art methods used in the literature.
Our method achieved an average recognition gain of 11.5% for six out of seven emotions for the EMO-DB dataset, and 13.8% for seven out of eight emotions for the RAVDESS dataset.
arXiv Detail & Related papers (2022-08-19T11:29:03Z) - Deep Feature Learning for Medical Acoustics [78.56998585396421]
The purpose of this paper is to compare different learnables in medical acoustics tasks.
A framework has been implemented to classify human respiratory sounds and heartbeats in two categories, i.e. healthy or affected by pathologies.
arXiv Detail & Related papers (2022-08-05T10:39:37Z) - Psychophysiological Arousal in Young Children Who Stutter: An
Interpretable AI Approach [6.507353572917133]
The presented study effectively identifies and visualizes the second-by-second pattern differences in the physiological arousal of preschool-age children who do stutter (CWS) and who do not stutter (CWNS)
The first condition may affect children's speech due to high arousal; the latter introduces linguistic, cognitive, and communicative demands on speakers.
arXiv Detail & Related papers (2022-08-03T13:28:15Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - An Exploration of Self-Supervised Pretrained Representations for
End-to-End Speech Recognition [98.70304981174748]
We focus on the general applications of pretrained speech representations, on advanced end-to-end automatic speech recognition (E2E-ASR) models.
We select several pretrained speech representations and present the experimental results on various open-source and publicly available corpora for E2E-ASR.
arXiv Detail & Related papers (2021-10-09T15:06:09Z) - NUVA: A Naming Utterance Verifier for Aphasia Treatment [49.114436579008476]
Assessment of speech performance using picture naming tasks is a key method for both diagnosis and monitoring of responses to treatment interventions by people with aphasia (PWA)
Here we present NUVA, an utterance verification system incorporating a deep learning element that classifies 'correct' versus'incorrect' naming attempts from aphasic stroke patients.
When tested on eight native British-English speaking PWA the system's performance accuracy ranged between 83.6% to 93.6%, with a 10-fold cross-validation mean of 89.5%.
arXiv Detail & Related papers (2021-02-10T13:00:29Z) - Is Everything Fine, Grandma? Acoustic and Linguistic Modeling for Robust
Elderly Speech Emotion Recognition [7.579298439023323]
This paper presents our contribution to the INTERSPEECH 2020 Computational Paralinguistics Challenge (ComParE) - Elderly Emotion Sub-Challenge.
We propose a bi-modal framework, where these tasks are modeled using state-of-the-art acoustic and linguistic features.
In this study, we demonstrate that exploiting task-specific dictionaries and resources can boost the performance of linguistic models.
arXiv Detail & Related papers (2020-09-07T21:19:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.