Sentiment recognition of Italian elderly through domain adaptation on
cross-corpus speech dataset
- URL: http://arxiv.org/abs/2211.07307v1
- Date: Mon, 14 Nov 2022 12:39:41 GMT
- Title: Sentiment recognition of Italian elderly through domain adaptation on
cross-corpus speech dataset
- Authors: Francesca Gasparini, Alessandra Grossi
- Abstract summary: The aim of this work is to define a speech emotion recognition (SER) model able to recognize positive, neutral and negative emotions in natural conversations of Italian elderly people.
- Score: 77.99182201815763
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The aim of this work is to define a speech emotion recognition (SER) model
able to recognize positive, neutral and negative emotions in natural
conversations of Italian elderly people. Several datasets for SER are available
in the literature. However most of them are in English or Chinese, have been
recorded while actors and actresses pronounce short phrases and thus are not
related to natural conversation. Moreover only few speeches among all the
databases are related to elderly people. Therefore, in this work, a
multi-language and multi-age corpus is considered merging a dataset in English,
that includes also elderly people, with a dataset in Italian. A general model,
trained on young and adult English actors and actresses is proposed, based on
XGBoost. Then two strategies of domain adaptation are proposed to adapt the
model either to elderly people and to Italian speakers. The results suggest
that this approach increases the classification performance, underlining also
that new datasets should be collected.
Related papers
- SER_AMPEL: a multi-source dataset for speech emotion recognition of
Italian older adults [58.49386651361823]
SER_AMPEL is a multi-source dataset for speech emotion recognition (SER)
It is collected with the aim of providing a reference for speech emotion recognition in case of Italian older adults.
The evidence of the need for such a dataset emerges from the analysis of the state of the art.
arXiv Detail & Related papers (2023-11-24T13:47:25Z) - Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech
Emotion Recognition [48.29355616574199]
We analyze the transferability of emotion recognition across three different languages--English, Mandarin Chinese, and Cantonese.
This study concludes that different language and age groups require specific speech features, thus making cross-lingual inference an unsuitable method.
arXiv Detail & Related papers (2023-06-26T08:48:08Z) - ITALIC: An Italian Intent Classification Dataset [16.970030804283745]
ITALIC is the first large-scale speech dataset designed for intent classification in Italian.
The dataset comprises 16,521 crowdsourced audio samples recorded by 70 speakers from various Italian regions.
Results on intent classification suggest that increasing scale and running language adaptation yield better speech models.
arXiv Detail & Related papers (2023-06-14T13:36:24Z) - Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years.
We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data.
Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z) - Adapting Multilingual Speech Representation Model for a New,
Underresourced Language through Multilingual Fine-tuning and Continued
Pretraining [2.3513645401551333]
We investigate the possibility for adapting an existing multilingual wav2vec 2.0 model for a new language.
Our results show that continued pretraining is the most effective method to adapt a wav2vec 2.0 model for a new language.
We find that if a model pretrained on a related speech variety or an unrelated language with similar phonological characteristics is available, multilingual fine-tuning using additional data from that language can have positive impact on speech recognition performance.
arXiv Detail & Related papers (2023-01-18T03:57:53Z) - M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for
Multilingual Speech to Image Retrieval [56.49878599920353]
This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval.
For non-English image-speech retrieval, we outperform the current state-of-the-art performance by a wide margin both when training separate models for each language, and with a single model which processes speech in all three languages.
arXiv Detail & Related papers (2022-11-02T14:54:45Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Is Everything Fine, Grandma? Acoustic and Linguistic Modeling for Robust
Elderly Speech Emotion Recognition [7.579298439023323]
This paper presents our contribution to the INTERSPEECH 2020 Computational Paralinguistics Challenge (ComParE) - Elderly Emotion Sub-Challenge.
We propose a bi-modal framework, where these tasks are modeled using state-of-the-art acoustic and linguistic features.
In this study, we demonstrate that exploiting task-specific dictionaries and resources can boost the performance of linguistic models.
arXiv Detail & Related papers (2020-09-07T21:19:16Z) - Investigating Language Impact in Bilingual Approaches for Computational
Language Documentation [28.838960956506018]
This paper investigates how the choice of translation language affects the posterior documentation work.
We create 56 bilingual pairs that we apply to the task of low-resource unsupervised word segmentation and alignment.
Our results suggest that incorporating clues into the neural models' input representation increases their translation and alignment quality.
arXiv Detail & Related papers (2020-03-30T10:30:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.