The ASRU 2019 Mandarin-English Code-Switching Speech Recognition
Challenge: Open Datasets, Tracks, Methods and Results
- URL: http://arxiv.org/abs/2007.05916v1
- Date: Sun, 12 Jul 2020 05:38:57 GMT
- Title: The ASRU 2019 Mandarin-English Code-Switching Speech Recognition
Challenge: Open Datasets, Tracks, Methods and Results
- Authors: Xian Shi, Qiangze Feng, Lei Xie
- Abstract summary: This paper describes the design and main outcomes of the ASRU 2019 Mandarin-English code-switching speech recognition challenge.
500 hours Mandarin speech data and 240 hours Mandarin-English intra-sentencial CS data are released to the participants.
- Score: 9.089285414356969
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code-switching (CS) is a common phenomenon and recognizing CS speech is
challenging. But CS speech data is scarce and there' s no common testbed in
relevant research. This paper describes the design and main outcomes of the
ASRU 2019 Mandarin-English code-switching speech recognition challenge, which
aims to improve the ASR performance in Mandarin-English code-switching
situation. 500 hours Mandarin speech data and 240 hours Mandarin-English
intra-sentencial CS data are released to the participants. Three tracks were
set for advancing the AM and LM part in traditional DNN-HMM ASR system, as well
as exploring the E2E models' performance. The paper then presents an overview
of the results and system performance in the three tracks. It turns out that
traditional ASR system benefits from pronunciation lexicon, CS text generating
and data augmentation. In E2E track, however, the results highlight the
importance of using language identification, building-up a rational set of
modeling units and spec-augment. The other details in model training and method
comparsion are discussed.
Related papers
- Speech collage: code-switched audio generation by collaging monolingual
corpora [50.356820349870986]
Speech Collage is a method that synthesizes CS data from monolingual corpora by splicing audio segments.
We investigate the impact of generated data on speech recognition in two scenarios.
arXiv Detail & Related papers (2023-09-27T14:17:53Z) - Unified model for code-switching speech recognition and language
identification based on a concatenated tokenizer [17.700515986659063]
Code-Switching (CS) multilingual Automatic Speech Recognition (ASR) models can transcribe speech containing two or more alternating languages during a conversation.
This paper proposes a new method for creating code-switching ASR datasets from purely monolingual data sources.
A novel Concatenated Tokenizer enables ASR models to generate language ID for each emitted text token while reusing existing monolingual tokenizers.
arXiv Detail & Related papers (2023-06-14T21:24:11Z) - From English to More Languages: Parameter-Efficient Model Reprogramming
for Cross-Lingual Speech Recognition [50.93943755401025]
We propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition.
We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement.
Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses.
arXiv Detail & Related papers (2023-01-19T02:37:56Z) - Language-agnostic Code-Switching in Sequence-To-Sequence Speech
Recognition [62.997667081978825]
Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages.
We propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are transcribed.
We show that this augmentation can even improve the model's performance on inter-sentential language switches not seen during training by 5,03% WER.
arXiv Detail & Related papers (2022-10-17T12:15:57Z) - Pronunciation-aware unique character encoding for RNN Transducer-based
Mandarin speech recognition [38.60303603000269]
We propose to use a novel pronunciation-aware unique character encoding for building E2E RNN-T-based Mandarin ASR systems.
The proposed encoding is a combination of pronunciation-base syllable and character index (CI)
arXiv Detail & Related papers (2022-07-29T09:49:10Z) - AISHELL-NER: Named Entity Recognition from Chinese Speech [54.434118596263126]
We introduce a new dataset AISEHLL-NER for NER from Chinese speech.
The results demonstrate that the performance could be improved by combining-aware ASR and pretrained NER tagger.
arXiv Detail & Related papers (2022-02-17T09:18:48Z) - Mandarin-English Code-switching Speech Recognition with Self-supervised
Speech Representation Models [55.82292352607321]
Code-switching (CS) is common in daily conversations where more than one language is used within a sentence.
This paper uses the recently successful self-supervised learning (SSL) methods to leverage many unlabeled speech data without CS.
arXiv Detail & Related papers (2021-10-07T14:43:35Z) - KARI: KAnari/QCRI's End-to-End systems for the INTERSPEECH 2021 Indian
Languages Code-Switching Challenge [7.711092265101041]
We present the Kanari/QCRI system and the modeling strategies used to participate in the Interspeech 2021 Code-switching (CS) challenge for low-resource Indian languages.
The subtask involved developing a speech recognition system for two CS datasets: Hindi-English and Bengali-English.
To tackle the CS challenges, we use transfer learning for incorporating the publicly available monolingual Hindi, Bengali, and English speech data.
arXiv Detail & Related papers (2021-06-10T16:12:51Z) - Streaming End-to-End Bilingual ASR Systems with Joint Language
Identification [19.09014345299161]
We introduce streaming, end-to-end, bilingual systems that perform both ASR and language identification.
The proposed method is applied to two language pairs: English-Spanish as spoken in the United States, and English-Hindi as spoken in India.
arXiv Detail & Related papers (2020-07-08T05:00:25Z) - Rnn-transducer with language bias for end-to-end Mandarin-English
code-switching speech recognition [58.105818353866354]
We propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem.
We use the language identities to bias the model to predict the CS points.
This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed.
arXiv Detail & Related papers (2020-02-19T12:01:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.