ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech
Recognition Challenge
- URL: http://arxiv.org/abs/2401.03473v3
- Date: Wed, 21 Feb 2024 03:39:37 GMT
- Title: ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech
Recognition Challenge
- Authors: He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei
Chen, Pan Zhou, Hui Bu, Xin Xu, Binbin Zhang, Zhuo Chen, Jian Wu, Longbiao
Wang, Eng Siong Chng, Sun Li
- Abstract summary: This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle.
First-place team USTCiflytek achieves a CER of 13.16% in the ASR track and a cpCER of 21.48% in the ASDR track.
- Score: 94.13624830833314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To promote speech processing and recognition research in driving scenarios,
we build on the success of the Intelligent Cockpit Speech Recognition Challenge
(ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel
Automatic Speech Recognition (ICMC-ASR) Challenge. This challenge collects over
100 hours of multi-channel speech data recorded inside a new energy vehicle and
40 hours of noise for data augmentation. Two tracks, including automatic speech
recognition (ASR) and automatic speech diarization and recognition (ASDR) are
set up, using character error rate (CER) and concatenated minimum permutation
character error rate (cpCER) as evaluation metrics, respectively. Overall, the
ICMC-ASR Challenge attracts 98 participating teams and receives 53 valid
results in both tracks. In the end, first-place team USTCiflytek achieves a CER
of 13.16% in the ASR track and a cpCER of 21.48% in the ASDR track, showing an
absolute improvement of 13.08% and 51.4% compared to our challenge baseline,
respectively.
Related papers
- ICASSP 2024 Speech Signal Improvement Challenge [27.7329948783064]
The ICASSP 2024 Speech Signal Improvement Grand Challenge is intended to stimulate research in the area of improving the speech signal quality in communication systems.
We enhance the competition by introducing a dataset synthesizer, enabling all participating teams to start at a higher baseline.
We evaluate a total of 13 systems in the real-time track and 11 systems in the non-real-time track using both subjective P.804 and objective Word Accuracy metrics.
arXiv Detail & Related papers (2024-01-25T18:08:00Z) - The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in
CNVSRC 2023 [67.11294606070278]
This paper delineates the visual speech recognition (VSR) system introduced by the NPU-ASLP-LiAuto (Team 237) in the first Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023.
In terms of data processing, we leverage the lip motion extractor from the baseline1 to produce multi-scale video data.
Various augmentation techniques are applied during training, encompassing speed perturbation, random rotation, horizontal flipping, and color transformation.
arXiv Detail & Related papers (2024-01-07T14:20:52Z) - MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition [62.89464258519723]
We propose a multi-layer cross-attention fusion based AVSR approach that promotes representation of each modality by fusing them at different levels of audio/visual encoders.
Our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset.
arXiv Detail & Related papers (2024-01-07T08:59:32Z) - The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge [50.41897641763171]
This paper summarizes our team's efforts in both tracks of the ICMC-ASR Challenge for in-car multi-channel automatic speech recognition.
Our submitted systems include the multi-channel front-end enhancement and diarization, training data augmentation, speech recognition modeling with multi-channel branches.
Tested on the offical Eval1 and Eval2 set, our best system achieves a relative 34.3% improvement in CER and 56.5% improvement in cpCER, compared to the offical baseline system.
arXiv Detail & Related papers (2023-12-26T11:11:22Z) - VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge [95.6159736804855]
The VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22) was held in conjunction with INTERSPEECH 2022.
The goal of this challenge was to evaluate how well state-of-the-art speaker recognition systems can diarise and recognise speakers from speech obtained "in the wild"
arXiv Detail & Related papers (2023-02-20T19:27:14Z) - Summary on the ISCSLP 2022 Chinese-English Code-Switching ASR Challenge [25.69349931845173]
The ISCSLP 2022 CSASR challenge provided two training sets, TAL_CSASR corpus and MagicData-RAMC corpus, a development and a test set for participants.
More than 40 teams participated in this challenge, and the winner team achieved 16.70% Mixture Error Rate (MER) performance on the test set.
In this paper, we will describe the datasets, the associated baselines system and the requirements, and summarize the CSASR challenge results and major techniques and tricks used in the submitted systems.
arXiv Detail & Related papers (2022-10-12T11:05:13Z) - The NIST CTS Speaker Recognition Challenge [1.5282767384702267]
The US National Institute of Standards and Technology (NIST) has been conducting a second iteration of the CTS Challenge since August 2020.
This paper presents an overview of the evaluation and several analyses of system performance for some primary conditions in the CTS Challenge.
arXiv Detail & Related papers (2022-04-21T16:06:27Z) - CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command
Recognition [91.33781557979819]
We introduce a new dataset, Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR)
It consists of 4,984 samples (8.3 hours) of 200 in-car commands recorded by 30 native Cantonese speakers.
We provide detailed statistics of both the clean and the augmented versions of our dataset.
arXiv Detail & Related papers (2022-01-11T06:32:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.