A Parameter-Efficient Learning Approach to Arabic Dialect Identification
with Pre-Trained General-Purpose Speech Model
- URL: http://arxiv.org/abs/2305.11244v2
- Date: Tue, 3 Oct 2023 19:45:25 GMT
- Title: A Parameter-Efficient Learning Approach to Arabic Dialect Identification
with Pre-Trained General-Purpose Speech Model
- Authors: Srijith Radhakrishnan, Chao-Han Huck Yang, Sumeer Ahmad Khan, Narsis
A. Kiani, David Gomez-Cabrero, Jesper N. Tegner
- Abstract summary: We develop a token-level label mapping to condition the GSM for Arabic Dialect Identification (ADI)
We achieve new state-of-the-art accuracy on the ADI-17 dataset by vanilla fine-tuning.
Our study demonstrates how to identify Arabic dialects using a small dataset and limited with open source code and pre-trained models.
- Score: 9.999900422312098
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this work, we explore Parameter-Efficient-Learning (PEL) techniques to
repurpose a General-Purpose-Speech (GSM) model for Arabic dialect
identification (ADI). Specifically, we investigate different setups to
incorporate trainable features into a multi-layer encoder-decoder GSM
formulation under frozen pre-trained settings. Our architecture includes
residual adapter and model reprogramming (input-prompting). We design a
token-level label mapping to condition the GSM for Arabic Dialect
Identification (ADI). This is challenging due to the high variation in
vocabulary and pronunciation among the numerous regional dialects. We achieve
new state-of-the-art accuracy on the ADI-17 dataset by vanilla fine-tuning. We
further reduce the training budgets with the PEL method, which performs within
1.86% accuracy to fine-tuning using only 2.5% of (extra) network trainable
parameters. Our study demonstrates how to identify Arabic dialects using a
small dataset and limited computation with open source code and pre-trained
models.
Related papers
- Towards Zero-Shot Text-To-Speech for Arabic Dialects [16.10882912169842]
Zero-shot multi-speaker text-to-speech (ZS-TTS) systems have advanced for English, however, it still lags behind due to insufficient resources.
We address this gap for Arabic by first adapting an existing dataset to suit the needs of speech synthesis.
We employ a set of Arabic dialect identification models to explore the impact of pre-defined dialect labels on improving the ZS-TTS model in a multi-dialect setting.
arXiv Detail & Related papers (2024-06-24T15:58:15Z) - TIPAA-SSL: Text Independent Phone-to-Audio Alignment based on Self-Supervised Learning and Knowledge Transfer [3.9981390090442694]
We present a novel approach for text independent phone-to-audio alignment based on phoneme recognition, representation learning and knowledge transfer.
We evaluate our model using synthetic native data from the TIMIT dataset and the SCRIBE dataset for American and British English.
Our proposed model outperforms the state-of-the-art (charsiu) in statistical metrics and has applications in language learning and speech processing systems.
arXiv Detail & Related papers (2024-05-03T14:25:21Z) - Mavericks at NADI 2023 Shared Task: Unravelling Regional Nuances through
Dialect Identification using Transformer-based Approach [0.0]
We highlight our methodology for subtask 1 which deals with country-level dialect identification.
The task uses the Twitter dataset (TWT-2023) that encompasses 18 dialects for the multi-class classification problem.
We achieved an F1-score of 76.65 (11th rank on the leaderboard) on the test dataset.
arXiv Detail & Related papers (2023-11-30T17:37:56Z) - From English to More Languages: Parameter-Efficient Model Reprogramming
for Cross-Lingual Speech Recognition [50.93943755401025]
We propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition.
We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement.
Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses.
arXiv Detail & Related papers (2023-01-19T02:37:56Z) - Pretraining Approaches for Spoken Language Recognition: TalTech
Submission to the OLR 2021 Challenge [0.0]
The paper is based on our submission to the Oriental Language Recognition 2021 Challenge.
For the constrained track, we first trained a Conformer-based encoder-decoder model for multilingual automatic speech recognition.
For the unconstrained task, we relied on both externally available pretrained models as well as external data.
arXiv Detail & Related papers (2022-05-14T15:17:08Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Intent Classification Using Pre-Trained Embeddings For Low Resource
Languages [67.40810139354028]
Building Spoken Language Understanding systems that do not rely on language specific Automatic Speech Recognition is an important yet less explored problem in language processing.
We present a comparative study aimed at employing a pre-trained acoustic model to perform Spoken Language Understanding in low resource scenarios.
We perform experiments across three different languages: English, Sinhala, and Tamil each with different data sizes to simulate high, medium, and low resource scenarios.
arXiv Detail & Related papers (2021-10-18T13:06:59Z) - AraELECTRA: Pre-Training Text Discriminators for Arabic Language
Understanding [0.0]
We develop an Arabic language representation model, which we name AraELECTRA.
Our model is pretrained using the replaced token detection objective on large Arabic text corpora.
We show that AraELECTRA outperforms current state-of-the-art Arabic language representation models, given the same pretraining data and with even a smaller model size.
arXiv Detail & Related papers (2020-12-31T09:35:39Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR)
AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities.
Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z) - Rnn-transducer with language bias for end-to-end Mandarin-English
code-switching speech recognition [58.105818353866354]
We propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem.
We use the language identities to bias the model to predict the CS points.
This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed.
arXiv Detail & Related papers (2020-02-19T12:01:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.