Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech
Emotion Recognition
- URL: http://arxiv.org/abs/2306.14517v1
- Date: Mon, 26 Jun 2023 08:48:08 GMT
- Title: Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech
Emotion Recognition
- Authors: Samuel Cahyawijaya, Holy Lovenia, Willy Chung, Rita Frieske, Zihan
Liu, Pascale Fung
- Abstract summary: We analyze the transferability of emotion recognition across three different languages--English, Mandarin Chinese, and Cantonese.
This study concludes that different language and age groups require specific speech features, thus making cross-lingual inference an unsuitable method.
- Score: 48.29355616574199
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Speech emotion recognition plays a crucial role in human-computer
interactions. However, most speech emotion recognition research is biased
toward English-speaking adults, which hinders its applicability to other
demographic groups in different languages and age groups. In this work, we
analyze the transferability of emotion recognition across three different
languages--English, Mandarin Chinese, and Cantonese; and 2 different age
groups--adults and the elderly. To conduct the experiment, we develop an
English-Mandarin speech emotion benchmark for adults and the elderly, BiMotion,
and a Cantonese speech emotion dataset, YueMotion. This study concludes that
different language and age groups require specific speech features, thus making
cross-lingual inference an unsuitable method. However, cross-group data
augmentation is still beneficial to regularize the model, with linguistic
distance being a significant influence on cross-lingual transferability. We
release publicly release our code at https://github.com/HLTCHKUST/elderly_ser.
Related papers
- Cross-lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models [16.0617753653454]
This study presents a comparative analysis between human performance and SSL models.
We also compare the SER ability of models and humans at both utterance- and segment-levels.
Our findings reveal that models, with appropriate knowledge transfer, can adapt to the target language and achieve performance comparable to native speakers.
arXiv Detail & Related papers (2024-09-25T13:27:17Z) - CLARA: Multilingual Contrastive Learning for Audio Representation
Acquisition [5.520654376217889]
CLARA minimizes reliance on labelled data, enhancing generalization across languages.
Our approach adeptly captures emotional nuances in speech, overcoming subjective assessment issues.
It adapts to low-resource languages, marking progress in multilingual speech representation learning.
arXiv Detail & Related papers (2023-10-18T09:31:56Z) - Learning Multilingual Expressive Speech Representation for Prosody
Prediction without Parallel Data [0.0]
We propose a method for speech-to-speech emotion translation that operates at the level of discrete speech units.
We show that this embedding can be used to predict the pitch and duration of speech units in a target language.
We evaluate our approach to English and French speech signals and show that it outperforms a baseline method.
arXiv Detail & Related papers (2023-06-29T08:06:54Z) - Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years.
We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data.
Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z) - Sentiment recognition of Italian elderly through domain adaptation on
cross-corpus speech dataset [77.99182201815763]
The aim of this work is to define a speech emotion recognition (SER) model able to recognize positive, neutral and negative emotions in natural conversations of Italian elderly people.
arXiv Detail & Related papers (2022-11-14T12:39:41Z) - Cross-Lingual Ability of Multilingual Masked Language Models: A Study of
Language Structure [54.01613740115601]
We study three language properties: constituent order, composition and word co-occurrence.
Our main conclusion is that the contribution of constituent order and word co-occurrence is limited, while the composition is more crucial to the success of cross-linguistic transfer.
arXiv Detail & Related papers (2022-03-16T07:09:35Z) - Is Speech Emotion Recognition Language-Independent? Analysis of English
and Bangla Languages using Language-Independent Vocal Features [4.446085353384894]
We used Bangla and English languages to assess whether distinguishing emotions from speech is independent of language.
The following emotions were categorized for this study: happiness, anger, neutral, sadness, disgust, and fear.
Although this study reveals that Speech Emotion Recognition (SER) is mostly language-independent, there is some disparity while recognizing emotional states like disgust and fear in these two languages.
arXiv Detail & Related papers (2021-11-21T09:28:49Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.