Summary on the ISCSLP 2022 Chinese-English Code-Switching ASR Challenge
- URL: http://arxiv.org/abs/2210.06091v2
- Date: Thu, 13 Oct 2022 08:26:45 GMT
- Title: Summary on the ISCSLP 2022 Chinese-English Code-Switching ASR Challenge
- Authors: Shuhao Deng, Chengfei Li, Jinfeng Bai, Qingqing Zhang, Wei-Qiang
Zhang, Runyan Yang, Gaofeng Cheng, Pengyuan Zhang and Yonghong Yan
- Abstract summary: The ISCSLP 2022 CSASR challenge provided two training sets, TAL_CSASR corpus and MagicData-RAMC corpus, a development and a test set for participants.
More than 40 teams participated in this challenge, and the winner team achieved 16.70% Mixture Error Rate (MER) performance on the test set.
In this paper, we will describe the datasets, the associated baselines system and the requirements, and summarize the CSASR challenge results and major techniques and tricks used in the submitted systems.
- Score: 25.69349931845173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code-switching automatic speech recognition becomes one of the most
challenging and the most valuable scenarios of automatic speech recognition,
due to the code-switching phenomenon between multilingual language and the
frequent occurrence of code-switching phenomenon in daily life. The ISCSLP 2022
Chinese-English Code-Switching Automatic Speech Recognition (CSASR) Challenge
aims to promote the development of code-switching automatic speech recognition.
The ISCSLP 2022 CSASR challenge provided two training sets, TAL_CSASR corpus
and MagicData-RAMC corpus, a development and a test set for participants, which
are used for CSASR model training and evaluation. Along with the challenge, we
also provide the baseline system performance for reference. As a result, more
than 40 teams participated in this challenge, and the winner team achieved
16.70% Mixture Error Rate (MER) performance on the test set and has achieved
9.8% MER absolute improvement compared with the baseline system. In this paper,
we will describe the datasets, the associated baselines system and the
requirements, and summarize the CSASR challenge results and major techniques
and tricks used in the submitted systems.
Related papers
- ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech
Recognition Challenge [94.13624830833314]
This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle.
First-place team USTCiflytek achieves a CER of 13.16% in the ASR track and a cpCER of 21.48% in the ASDR track.
arXiv Detail & Related papers (2024-01-07T12:51:42Z) - MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition [62.89464258519723]
We propose a multi-layer cross-attention fusion based AVSR approach that promotes representation of each modality by fusing them at different levels of audio/visual encoders.
Our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset.
arXiv Detail & Related papers (2024-01-07T08:59:32Z) - Perception Test 2023: A Summary of the First Challenge And Outcome [67.0525378209708]
The First Perception Test challenge was held as a half-day workshop alongside the IEEE/CVF International Conference on Computer Vision (ICCV) 2023.
The goal was to benchmarking state-of-the-art video models on the recently proposed Perception Test benchmark.
We summarise in this report the task descriptions, metrics, baselines, and results.
arXiv Detail & Related papers (2023-12-20T15:12:27Z) - A Study on the Integration of Pipeline and E2E SLU systems for Spoken
Semantic Parsing toward STOP Quality Challenge [33.89616011003973]
We describe our proposed spoken semantic parsing system for the quality track (Track 1) in Spoken Language Understanding Grand Challenge.
Strong automatic speech recognition (ASR) models like Whisper and pretrained Language models (LM) like BART are utilized inside our SLU framework to boost performance.
We also investigate the output level combination of various models to get an exact match accuracy of 80.8, which won the 1st place at the challenge.
arXiv Detail & Related papers (2023-05-02T17:25:19Z) - Code Switched and Code Mixed Speech Recognition for Indic languages [0.0]
Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and lexical information is typically language specific.
We compare the performance of end to end multilingual speech recognition system to the performance of monolingual models conditioned on language identification (LID)
We also propose a similar technique to solve the Code Switched problem and achieve a WER of 21.77 and 28.27 over Hindi-English and Bengali-English respectively.
arXiv Detail & Related papers (2022-03-30T18:09:28Z) - OLR 2021 Challenge: Datasets, Rules and Baselines [23.878103387338918]
The data profile, four tasks, two baselines, and the evaluation principles are introduced in this paper.
In addition to the Language Identification (LID) tasks, multilingual Automatic Speech Recognition (ASR) tasks are introduced to OLR 2021 Challenge for the first time.
arXiv Detail & Related papers (2021-07-23T09:57:29Z) - Arabic Code-Switching Speech Recognition using Monolingual Data [13.513655231184261]
Code-switching in automatic speech recognition (ASR) is an important challenge due to globalization.
Recent research in multilingual ASR shows potential improvement over monolingual systems.
We study key issues related to multilingual modeling for ASR through a series of large-scale ASR experiments.
arXiv Detail & Related papers (2021-07-04T08:40:49Z) - Auto-KWS 2021 Challenge: Task, Datasets, and Baselines [63.82759886293636]
Auto-KWS 2021 challenge calls for automated machine learning (AutoML) solutions to automate the process of applying machine learning to a customized keyword spotting task.
The challenge focuses on the problem of customized keyword spotting, where the target device can only be awakened by an enrolled speaker with his specified keyword.
arXiv Detail & Related papers (2021-03-31T14:56:48Z) - The Sequence-to-Sequence Baseline for the Voice Conversion Challenge
2020: Cascading ASR and TTS [66.06385966689965]
This paper presents the sequence-to-sequence (seq2seq) baseline system for the voice conversion challenge (VCC) 2020.
We consider a naive approach for voice conversion (VC), which is to first transcribe the input speech with an automatic speech recognition (ASR) model.
We revisit this method under a sequence-to-sequence (seq2seq) framework by utilizing ESPnet, an open-source end-to-end speech processing toolkit.
arXiv Detail & Related papers (2020-10-06T02:27:38Z) - The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning
with Keywords and Sentence Length Estimation [49.41766997393417]
This report describes the system participating to the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 Challenge, Task 6.
Our submission focuses on solving two indeterminacy problems in automated audio captioning: word selection indeterminacy and sentence length indeterminacy.
We simultaneously solve the main caption generation and sub indeterminacy problems by estimating keywords and sentence length through multi-task learning.
arXiv Detail & Related papers (2020-07-01T04:26:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.