The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing
Multilingual ASR
- URL: http://arxiv.org/abs/2305.19584v1
- Date: Wed, 31 May 2023 06:09:11 GMT
- Title: The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing
Multilingual ASR
- Authors: Kaousheik Jayakumar, Vrunda N. Sukhadia, A Arunkumar, S. Umesh
- Abstract summary: Building a multilingual Automated Speech Recognition system in a linguistically diverse country like India can be a challenging task.
This problem can be solved by exploiting the fact that many of these languages are phonetically similar.
New approaches are explored and compared to improve the performance of CLS based multilingual ASR model.
- Score: 0.2676349883103404
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Building a multilingual Automated Speech Recognition (ASR) system in a
linguistically diverse country like India can be a challenging task due to the
differences in scripts and the limited availability of speech data. This
problem can be solved by exploiting the fact that many of these languages are
phonetically similar. These languages can be converted into a Common Label Set
(CLS) by mapping similar sounds to common labels. In this paper, new approaches
are explored and compared to improve the performance of CLS based multilingual
ASR model. Specific language information is infused in the ASR model by giving
Language ID or using CLS to Native script converter on top of the CLS
Multilingual model. These methods give a significant improvement in Word Error
Rate (WER) compared to the CLS baseline. These methods are further tried on
out-of-distribution data to check their robustness.
Related papers
- Language Bias in Self-Supervised Learning For Automatic Speech Recognition [15.976590369684464]
Self-supervised learning (SSL) is used in deep learning to train on large datasets without the need for expensive labelling of the data.
In this paper, we identify language-specificworks within XLS-R and test the performance of theseworks on a variety of different languages.
arXiv Detail & Related papers (2025-01-31T17:16:45Z) - Enhancing Multilingual ASR for Unseen Languages via Language Embedding Modeling [50.62091603179394]
Whisper, one of the most advanced ASR models, handles 99 languages effectively.
However, Whisper struggles with unseen languages, those not included in its pre-training.
We propose methods that exploit these relationships to enhance ASR performance on unseen languages.
arXiv Detail & Related papers (2024-12-21T04:05:43Z) - Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.
Currently, instruction-tuned large language models (LLMs) excel at various English tasks.
Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z) - Unified model for code-switching speech recognition and language
identification based on a concatenated tokenizer [17.700515986659063]
Code-Switching (CS) multilingual Automatic Speech Recognition (ASR) models can transcribe speech containing two or more alternating languages during a conversation.
This paper proposes a new method for creating code-switching ASR datasets from purely monolingual data sources.
A novel Concatenated Tokenizer enables ASR models to generate language ID for each emitted text token while reusing existing monolingual tokenizers.
arXiv Detail & Related papers (2023-06-14T21:24:11Z) - DuDe: Dual-Decoder Multilingual ASR for Indian Languages using Common
Label Set [0.0]
Common Label Set ( CLS) maps graphemes of various languages with similar sounds to common labels.
Since Indian languages are mostly phonetic, building a transliteration to convert from native script to CLS is easy.
We propose a novel architecture called Multilingual-Decoder-Decoder for building multilingual systems.
arXiv Detail & Related papers (2022-10-30T04:01:26Z) - LAE: Language-Aware Encoder for Monolingual and Multilingual ASR [87.74794847245536]
A novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information.
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level.
arXiv Detail & Related papers (2022-06-05T04:03:12Z) - Multi-level Contrastive Learning for Cross-lingual Spoken Language
Understanding [90.87454350016121]
We develop novel code-switching schemes to generate hard negative examples for contrastive learning at all levels.
We develop a label-aware joint model to leverage label semantics for cross-lingual knowledge transfer.
arXiv Detail & Related papers (2022-05-07T13:44:28Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.