Is Everything Fine, Grandma? Acoustic and Linguistic Modeling for Robust
Elderly Speech Emotion Recognition
- URL: http://arxiv.org/abs/2009.03432v1
- Date: Mon, 7 Sep 2020 21:19:16 GMT
- Title: Is Everything Fine, Grandma? Acoustic and Linguistic Modeling for Robust
Elderly Speech Emotion Recognition
- Authors: Gizem So\u{g}anc{\i}o\u{g}lu, Oxana Verkholyak, Heysem Kaya, Dmitrii
Fedotov, Tobias Cad\`ee, Albert Ali Salah, Alexey Karpov
- Abstract summary: This paper presents our contribution to the INTERSPEECH 2020 Computational Paralinguistics Challenge (ComParE) - Elderly Emotion Sub-Challenge.
We propose a bi-modal framework, where these tasks are modeled using state-of-the-art acoustic and linguistic features.
In this study, we demonstrate that exploiting task-specific dictionaries and resources can boost the performance of linguistic models.
- Score: 7.579298439023323
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Acoustic and linguistic analysis for elderly emotion recognition is an
under-studied and challenging research direction, but essential for the
creation of digital assistants for the elderly, as well as unobtrusive
telemonitoring of elderly in their residences for mental healthcare purposes.
This paper presents our contribution to the INTERSPEECH 2020 Computational
Paralinguistics Challenge (ComParE) - Elderly Emotion Sub-Challenge, which is
comprised of two ternary classification tasks for arousal and valence
recognition. We propose a bi-modal framework, where these tasks are modeled
using state-of-the-art acoustic and linguistic features, respectively. In this
study, we demonstrate that exploiting task-specific dictionaries and resources
can boost the performance of linguistic models, when the amount of labeled data
is small. Observing a high mismatch between development and test set
performances of various models, we also propose alternative training and
decision fusion strategies to better estimate and improve the generalization
performance.
Related papers
- Identification of Cognitive Decline from Spoken Language through Feature
Selection and the Bag of Acoustic Words Model [0.0]
The early identification of symptoms of memory disorders plays a significant role in ensuring the well-being of populations.
The lack of standardized speech tests in clinical settings has led to a growing emphasis on developing automatic machine learning techniques for analyzing naturally spoken language.
The work presents an approach related to feature selection, allowing for the automatic selection of the essential features required for diagnosis from the Geneva minimalistic acoustic parameter set and relative speech pauses.
arXiv Detail & Related papers (2024-02-02T17:06:03Z) - BabySLM: language-acquisition-friendly benchmark of self-supervised
spoken language models [56.93604813379634]
Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels.
We propose a language-acquisition-friendly benchmark to probe spoken language models at the lexical and syntactic levels.
We highlight two exciting challenges that need to be addressed for further progress: bridging the gap between text and speech and between clean speech and in-the-wild speech.
arXiv Detail & Related papers (2023-06-02T12:54:38Z) - A Hierarchical Regression Chain Framework for Affective Vocal Burst
Recognition [72.36055502078193]
We propose a hierarchical framework, based on chain regression models, for affective recognition from vocal bursts.
To address the challenge of data sparsity, we also use self-supervised learning (SSL) representations with layer-wise and temporal aggregation modules.
The proposed systems participated in the ACII Affective Vocal Burst (A-VB) Challenge 2022 and ranked first in the "TWO'' and "CULTURE" tasks.
arXiv Detail & Related papers (2023-03-14T16:08:45Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - Sentiment recognition of Italian elderly through domain adaptation on
cross-corpus speech dataset [77.99182201815763]
The aim of this work is to define a speech emotion recognition (SER) model able to recognize positive, neutral and negative emotions in natural conversations of Italian elderly people.
arXiv Detail & Related papers (2022-11-14T12:39:41Z) - Acoustic-Linguistic Features for Modeling Neurological Task Score in
Alzheimer's [1.290382979353427]
Natural language processing and machine learning provide promising techniques for reliably detecting Alzheimer's disease.
We compare and contrast the performance of ten linear regression models for predicting Mini-Mental Status exam scores.
We find that, for the given task, handcrafted linguistic features are more significant than acoustic and learned features.
arXiv Detail & Related papers (2022-09-13T15:35:31Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Automated Speech Scoring System Under The Lens: Evaluating and
interpreting the linguistic cues for language proficiency [26.70127591966917]
We utilize classical machine learning models to formulate a speech scoring task as both a classification and a regression problem.
First, we extract linguist features under five categories (fluency, pronunciation, content, grammar and vocabulary, and acoustic) and train models to grade responses.
In comparison, we find that the regression-based models perform equivalent to or better than the classification approach.
arXiv Detail & Related papers (2021-11-30T06:28:58Z) - Alzheimer's Dementia Recognition through Spontaneous Speech: The ADReSS
Challenge [10.497861245133086]
The ADReSS Challenge at INTERSPEECH 2020 defines a shared task through which different approaches to the automated recognition of Alzheimer's dementia can be compared.
ADReSS provides researchers with a benchmark speech dataset which has been acoustically pre-processed and balanced in terms of age and gender.
arXiv Detail & Related papers (2020-04-14T23:25:09Z) - Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features.
We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.