Related papers: Transfer learning from High-Resource to Low-Resource Language Improves Speech Affect Recognition Classification Accuracy

Transfer learning from High-Resource to Low-Resource Language Improves Speech Affect Recognition Classification Accuracy

URL: http://arxiv.org/abs/2103.11764v1
Date: Thu, 4 Mar 2021 08:17:19 GMT
Title: Transfer learning from High-Resource to Low-Resource Language Improves Speech Affect Recognition Classification Accuracy
Authors: Sara Durrani and Umair Arshad
Abstract summary: We present an approach in which the model is trained on high resource language and fine-tune to recognize affects in low resource language. We train the model in same corpus setting on SAVEE, EMOVO, Urdu, and IEMOCAP by achieving baseline accuracy of 60.45, 68.05, 80.34, and 56.58 percent respectively.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Speech Affect Recognition is a problem of extracting emotional affects from audio data. Low resource languages corpora are rear and affect recognition is a difficult task in cross-corpus settings. We present an approach in which the model is trained on high resource language and fine-tune to recognize affects in low resource language. We train the model in same corpus setting on SAVEE, EMOVO, Urdu, and IEMOCAP by achieving baseline accuracy of 60.45, 68.05, 80.34, and 56.58 percent respectively. For capturing the diversity of affects in languages cross-corpus evaluations are discussed in detail. We find that accuracy improves by adding the domain target data into the training data. Finally, we show that performance is improved for low resource language speech affect recognition by achieving the UAR OF 69.32 and 68.2 for Urdu and Italian speech affects.

Related papers

Transfer Learning from Visual Speech Recognition to Mouthing Recognition in German Sign Language [5.92177182003275]
This work directly classifies mouthing instances to their corresponding words in the spoken language.<n>We explore the potential of transfer learning from Visual Speech Recognition to mouthing recognition in German Sign Language.
arXiv Detail & Related papers (2025-05-20T00:15:51Z)
Leveraging Cross-Attention Transformer and Multi-Feature Fusion for Cross-Linguistic Speech Emotion Recognition [60.58049741496505]
Speech Emotion Recognition (SER) plays a crucial role in enhancing human-computer interaction. We propose a novel approach HuMP-CAT, which combines HuBERT, MFCC, and prosodic characteristics. We show that, by fine-tuning the source model with a small portion of speech from the target datasets, HuMP-CAT achieves an average accuracy of 78.75%.
arXiv Detail & Related papers (2025-01-06T14:31:25Z)
Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection [49.27067541740956]
Speech Emotion Recognition (SER) is a crucial component in developing general-purpose AI agents capable of natural human-computer interaction. Building robust multilingual SER systems remains challenging due to the scarcity of labeled data in languages other than English and Chinese. We propose an approach to enhance SER performance in low SER resource languages by leveraging data from high-resource languages.
arXiv Detail & Related papers (2024-09-17T08:36:45Z)
Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages [24.856817602140193]
This study focuses on two endangered Austronesian languages, Amis and Seediq. We propose a novel data-selection scheme leveraging a multilingual corpus to augment the limited target language data.
arXiv Detail & Related papers (2024-09-13T14:35:47Z)
CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition [5.520654376217889]
CLARA minimizes reliance on labelled data, enhancing generalization across languages. Our approach adeptly captures emotional nuances in speech, overcoming subjective assessment issues. It adapts to low-resource languages, marking progress in multilingual speech representation learning.
arXiv Detail & Related papers (2023-10-18T09:31:56Z)
Adversarial Training For Low-Resource Disfluency Correction [50.51901599433536]
We propose an adversarially-trained sequence-tagging model for Disfluency Correction (DC) We show the benefit of our proposed technique, which crucially depends on synthetically generated disfluent data, by evaluating it for DC in three Indian languages. Our technique also performs well in removing stuttering disfluencies in ASR transcripts introduced by speech impairments.
arXiv Detail & Related papers (2023-06-10T08:58:53Z)
Data Augmentation for Speech Recognition in Maltese: A Low-Resource Perspective [4.6898263272139795]
We consider data augmentation techniques for improving speech recognition in Maltese. We consider three types of data augmentation: unsupervised training, multilingual training and the use of synthesized speech as training data. Our results show that combining the three data augmentation techniques studied here lead us to an absolute WER improvement of 15% without the use of a language model.
arXiv Detail & Related papers (2021-11-15T14:28:21Z)
Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity [81.51206991542242]
Cross-lingual transfer offers a compelling way to help bridge this digital divide. Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-resource languages. We propose a language similarity approach that can efficiently identify acoustic cross-lingual transfer pairs across hundreds of languages.
arXiv Detail & Related papers (2021-11-02T01:55:17Z)
Multilingual transfer of acoustic word embeddings improves when training on languages related to the target zero-resource language [32.170748231414365]
We show that training on even just a single related language gives the largest gain. We also find that adding data from unrelated languages generally doesn't hurt performance.
arXiv Detail & Related papers (2021-06-24T08:37:05Z)
Transfer Learning based Speech Affect Recognition in Urdu [0.0]
We pre-train a model for high resource language affect recognition task and fine tune the parameters for low resource language. This approach achieves high Unweighted Average Recall (UAR) when compared with existing algorithms.
arXiv Detail & Related papers (2021-03-05T10:30:58Z)
Unsupervised Cross-lingual Representation Learning for Speech Recognition [63.85924123692923]
XLSR learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages. We build on wav2vec 2.0 which is trained by solving a contrastive task over masked latent speech representations. Experiments show that cross-lingual pretraining significantly outperforms monolingual pretraining.
arXiv Detail & Related papers (2020-06-24T18:25:05Z)
Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation [63.16500026845157]
We introduce speech-to-text translation as an auxiliary task to incorporate additional knowledge of the target language. We show that training ST with human translations is not necessary. Even with pseudo-labels from low-resource MT (200K examples), ST-enhanced transfer brings up to 8.9% WER reduction to direct transfer.
arXiv Detail & Related papers (2020-06-09T19:34:11Z)
Meta-Transfer Learning for Code-Switched Speech Recognition [72.84247387728999]
We propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting. Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data.
arXiv Detail & Related papers (2020-04-29T14:27:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.