Related papers: The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing Multilingual ASR

The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing Multilingual ASR

URL: http://arxiv.org/abs/2305.19584v1
Date: Wed, 31 May 2023 06:09:11 GMT
Title: The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing Multilingual ASR
Authors: Kaousheik Jayakumar, Vrunda N. Sukhadia, A Arunkumar, S. Umesh
Abstract summary: Building a multilingual Automated Speech Recognition system in a linguistically diverse country like India can be a challenging task. This problem can be solved by exploiting the fact that many of these languages are phonetically similar. New approaches are explored and compared to improve the performance of CLS based multilingual ASR model.
Score: 0.2676349883103404
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Building a multilingual Automated Speech Recognition (ASR) system in a linguistically diverse country like India can be a challenging task due to the differences in scripts and the limited availability of speech data. This problem can be solved by exploiting the fact that many of these languages are phonetically similar. These languages can be converted into a Common Label Set (CLS) by mapping similar sounds to common labels. In this paper, new approaches are explored and compared to improve the performance of CLS based multilingual ASR model. Specific language information is infused in the ASR model by giving Language ID or using CLS to Native script converter on top of the CLS Multilingual model. These methods give a significant improvement in Word Error Rate (WER) compared to the CLS baseline. These methods are further tried on out-of-distribution data to check their robustness.

Related papers

Language Bias in Self-Supervised Learning For Automatic Speech Recognition [15.976590369684464]
Self-supervised learning (SSL) is used in deep learning to train on large datasets without the need for expensive labelling of the data. In this paper, we identify language-specificworks within XLS-R and test the performance of theseworks on a variety of different languages.
arXiv Detail & Related papers (2025-01-31T17:16:45Z)
Enhancing Multilingual ASR for Unseen Languages via Language Embedding Modeling [50.62091603179394]
Whisper, one of the most advanced ASR models, handles 99 languages effectively. However, Whisper struggles with unseen languages, those not included in its pre-training. We propose methods that exploit these relationships to enhance ASR performance on unseen languages.
arXiv Detail & Related papers (2024-12-21T04:05:43Z)
Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language. Currently, instruction-tuned large language models (LLMs) excel at various English tasks. Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z)
Unified model for code-switching speech recognition and language identification based on a concatenated tokenizer [17.700515986659063]
Code-Switching (CS) multilingual Automatic Speech Recognition (ASR) models can transcribe speech containing two or more alternating languages during a conversation. This paper proposes a new method for creating code-switching ASR datasets from purely monolingual data sources. A novel Concatenated Tokenizer enables ASR models to generate language ID for each emitted text token while reusing existing monolingual tokenizers.
arXiv Detail & Related papers (2023-06-14T21:24:11Z)
Efficiently Aligned Cross-Lingual Transfer Learning for Conversational Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks. We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset. To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z)
DuDe: Dual-Decoder Multilingual ASR for Indian Languages using Common Label Set [0.0]
Common Label Set ( CLS) maps graphemes of various languages with similar sounds to common labels. Since Indian languages are mostly phonetic, building a transliteration to convert from native script to CLS is easy. We propose a novel architecture called Multilingual-Decoder-Decoder for building multilingual systems.
arXiv Detail & Related papers (2022-10-30T04:01:26Z)
Learning ASR pathways: A sparse multilingual ASR model [31.147484652643282]
We present ASR pathways, a sparse multilingual ASR model that activates language-specific sub-networks ("pathways") With the overlapping sub-networks, the shared parameters can also enable knowledge transfer for lower-resource languages via joint multilingual training. Our proposed ASR pathways outperform both dense models and a language-agnostically pruned model, and provide better performance on low-resource languages.
arXiv Detail & Related papers (2022-09-13T05:14:08Z)
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information. Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z)
Multi-level Contrastive Learning for Cross-lingual Spoken Language Understanding [90.87454350016121]
We develop novel code-switching schemes to generate hard negative examples for contrastive learning at all levels. We develop a label-aware joint model to leverage label semantics for cross-lingual knowledge transfer.
arXiv Detail & Related papers (2022-05-07T13:44:28Z)
UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks. Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages. We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z)
Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages. We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC) LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language. We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z)
GLUECoS : An Evaluation Benchmark for Code-Switched NLP [17.066725832825423]
We present an evaluation benchmark, GLUECoS, for code-switched languages. We present results on several NLP tasks in English-Hindi and English-Spanish. We fine-tune multilingual models on artificially generated code-switched data.
arXiv Detail & Related papers (2020-04-26T13:28:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.