Related papers: Transfer Learning based Speech Affect Recognition in Urdu

Transfer Learning based Speech Affect Recognition in Urdu

URL: http://arxiv.org/abs/2103.03580v1
Date: Fri, 5 Mar 2021 10:30:58 GMT
Title: Transfer Learning based Speech Affect Recognition in Urdu
Authors: Sara Durrani, Muhammad Umair Arshad
Abstract summary: We pre-train a model for high resource language affect recognition task and fine tune the parameters for low resource language. This approach achieves high Unweighted Average Recall (UAR) when compared with existing algorithms.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It has been established that Speech Affect Recognition for low resource languages is a difficult task. Here we present a Transfer learning based Speech Affect Recognition approach in which: we pre-train a model for high resource language affect recognition task and fine tune the parameters for low resource language using Deep Residual Network. Here we use standard four data sets to demonstrate that transfer learning can solve the problem of data scarcity for Affect Recognition task. We demonstrate that our approach is efficient by achieving 74.7 percent UAR on RAVDESS as source and Urdu data set as a target. Through an ablation study, we have identified that pre-trained model adds most of the features information, improvement in results and solves less data issues. Using this knowledge, we have also experimented on SAVEE and EMO-DB data set by setting Urdu as target language where only 400 utterances of data is available. This approach achieves high Unweighted Average Recall (UAR) when compared with existing algorithms.

Related papers

Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis [4.774607166378613]
Self-supervised pre-training, combined with large language models (LLM), can effectively enhance ASR performance in low-resource scenarios.<n>We pre-train a Data2vec2 model on 300,000 hours of unlabeled dialect and accented speech data and do alignment training on a supervised dataset of 40,000 hours.
arXiv Detail & Related papers (2025-05-27T12:50:55Z)
Data-Efficient Hate Speech Detection via Cross-Lingual Nearest Neighbor Retrieval with Limited Labeled Data [59.30098850050971]
Cross-lingual transfer learning can improve performance on tasks with limited labeled data.<n>We leverage nearest-neighbor retrieval to augment minimal labeled data in the target language.<n>We evaluate our approach on eight languages and demonstrate that it consistently outperforms models trained solely on the target language data.
arXiv Detail & Related papers (2025-05-20T12:25:33Z)
Evaluating Standard and Dialectal Frisian ASR: Multilingual Fine-tuning and Language Identification for Improved Low-resource Performance [9.624005980086707]
State-of-the-art methods deploy self-supervised transfer learning where a model pre-trained on large amounts of data is fine-tuned using little labeled data. We show that Frisian ASR performance can be improved by using multilingual fine-tuning data and an auxiliary language identification task.
arXiv Detail & Related papers (2025-02-07T12:42:46Z)
Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection [49.27067541740956]
Speech Emotion Recognition (SER) is a crucial component in developing general-purpose AI agents capable of natural human-computer interaction. Building robust multilingual SER systems remains challenging due to the scarcity of labeled data in languages other than English and Chinese. We propose an approach to enhance SER performance in low SER resource languages by leveraging data from high-resource languages.
arXiv Detail & Related papers (2024-09-17T08:36:45Z)
Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages [24.856817602140193]
This study focuses on two endangered Austronesian languages, Amis and Seediq. We propose a novel data-selection scheme leveraging a multilingual corpus to augment the limited target language data.
arXiv Detail & Related papers (2024-09-13T14:35:47Z)
Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition [31.575930914290762]
Exploiting cross-lingual resources is an effective way to compensate for data scarcity of low resource languages. We extend the concept of learnable cross-lingual mappings for end-to-end speech recognition. The results show that any source language ASR model can be used for a low-resource target language recognition.
arXiv Detail & Related papers (2023-06-14T15:24:31Z)
Multilingual Speech Recognition using Knowledge Transfer across Learning Processes [15.927513451432946]
Experimental results reveal the best pre-training strategy resulting in 3.55% relative reduction in overall WER. A combination of LEAP and SSL yields 3.51% relative reduction in overall WER when using language ID.
arXiv Detail & Related papers (2021-10-15T07:50:27Z)
Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding [69.40915115518523]
Lack of training data presents a grand challenge to scaling out spoken language understanding (SLU) to low-resource languages. Various data augmentation approaches have been proposed to synthesize training data in low-resource target languages. In this paper we focus on mitigating noise in augmented data.
arXiv Detail & Related papers (2021-09-03T15:44:15Z)
Transfer learning from High-Resource to Low-Resource Language Improves Speech Affect Recognition Classification Accuracy [0.0]
We present an approach in which the model is trained on high resource language and fine-tune to recognize affects in low resource language. We train the model in same corpus setting on SAVEE, EMOVO, Urdu, and IEMOCAP by achieving baseline accuracy of 60.45, 68.05, 80.34, and 56.58 percent respectively.
arXiv Detail & Related papers (2021-03-04T08:17:19Z)
A Transfer Learning Method for Speech Emotion Recognition from Automatic Speech Recognition [0.0]
We show a transfer learning method in speech emotion recognition based on a Time-Delay Neural Network architecture. We achieve the highest significantly higher accuracy when compared to state-of-the-art, using five-fold cross validation.
arXiv Detail & Related papers (2020-08-06T20:37:22Z)
Unsupervised Cross-lingual Representation Learning for Speech Recognition [63.85924123692923]
XLSR learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages. We build on wav2vec 2.0 which is trained by solving a contrastive task over masked latent speech representations. Experiments show that cross-lingual pretraining significantly outperforms monolingual pretraining.
arXiv Detail & Related papers (2020-06-24T18:25:05Z)
Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation [63.16500026845157]
We introduce speech-to-text translation as an auxiliary task to incorporate additional knowledge of the target language. We show that training ST with human translations is not necessary. Even with pseudo-labels from low-resource MT (200K examples), ST-enhanced transfer brings up to 8.9% WER reduction to direct transfer.
arXiv Detail & Related papers (2020-06-09T19:34:11Z)
Meta-Transfer Learning for Code-Switched Speech Recognition [72.84247387728999]
We propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting. Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data.
arXiv Detail & Related papers (2020-04-29T14:27:19Z)
Learning to Learn Morphological Inflection for Resource-Poor Languages [105.11499402984482]
We propose to cast the task of morphological inflection - mapping a lemma to an indicated inflected form - for resource-poor languages as a meta-learning problem. Treating each language as a separate task, we use data from high-resource source languages to learn a set of model parameters. Experiments with two model architectures on 29 target languages from 3 families show that our suggested approach outperforms all baselines.
arXiv Detail & Related papers (2020-04-28T05:13:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.