The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge
- URL: http://arxiv.org/abs/2312.16002v1
- Date: Tue, 26 Dec 2023 11:11:22 GMT
- Title: The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge
- Authors: Meng Ge, Yizhou Peng, Yidi Jiang, Jingru Lin, Junyi Ao, Mehmet Sinan
Yildirim, Shuai Wang, Haizhou Li, Mengling Feng
- Abstract summary: This paper summarizes our team's efforts in both tracks of the ICMC-ASR Challenge for in-car multi-channel automatic speech recognition.
Our submitted systems include the multi-channel front-end enhancement and diarization, training data augmentation, speech recognition modeling with multi-channel branches.
Tested on the offical Eval1 and Eval2 set, our best system achieves a relative 34.3% improvement in CER and 56.5% improvement in cpCER, compared to the offical baseline system.
- Score: 50.41897641763171
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper summarizes our team's efforts in both tracks of the ICMC-ASR
Challenge for in-car multi-channel automatic speech recognition. Our submitted
systems for ICMC-ASR Challenge include the multi-channel front-end enhancement
and diarization, training data augmentation, speech recognition modeling with
multi-channel branches. Tested on the offical Eval1 and Eval2 set, our best
system achieves a relative 34.3% improvement in CER and 56.5% improvement in
cpCER, compared to the offical baseline system.
Related papers
- The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in
CNVSRC 2023 [67.11294606070278]
This paper delineates the visual speech recognition (VSR) system introduced by the NPU-ASLP-LiAuto (Team 237) in the first Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023.
In terms of data processing, we leverage the lip motion extractor from the baseline1 to produce multi-scale video data.
Various augmentation techniques are applied during training, encompassing speed perturbation, random rotation, horizontal flipping, and color transformation.
arXiv Detail & Related papers (2024-01-07T14:20:52Z) - ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech
Recognition Challenge [94.13624830833314]
This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle.
First-place team USTCiflytek achieves a CER of 13.16% in the ASR track and a cpCER of 21.48% in the ASDR track.
arXiv Detail & Related papers (2024-01-07T12:51:42Z) - MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition [62.89464258519723]
We propose a multi-layer cross-attention fusion based AVSR approach that promotes representation of each modality by fusing them at different levels of audio/visual encoders.
Our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset.
arXiv Detail & Related papers (2024-01-07T08:59:32Z) - Two-pass Decoding and Cross-adaptation Based System Combination of
End-to-end Conformer and Hybrid TDNN ASR Systems [61.90743116707422]
This paper investigates multi-pass rescoring and cross adaptation based system combination approaches for hybrid TDNN and Conformer E2E ASR systems.
The best combined system obtained using multi-pass rescoring produced statistically significant word error rate (WER) reductions of 2.5% to 3.9% absolute (22.5% to 28.9% relative) over the stand alone Conformer system on the NIST Hub5'00, Rt03 and Rt02 evaluation data.
arXiv Detail & Related papers (2022-06-23T10:17:13Z) - The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party
meeting transcription (M2MeT) challenge [43.262531688434215]
We propose two improvements to target-speaker voice activity detection (TS-VAD)
These techniques are designed to handle multi-speaker conversations in real-world meeting scenarios with high speaker-overlap ratios and under heavy reverberant and noisy condition.
arXiv Detail & Related papers (2022-02-10T06:06:48Z) - Royalflush Speaker Diarization System for ICASSP 2022 Multi-channel
Multi-party Meeting Transcription Challenge [4.022057598291766]
Royalflush speaker diarization system submitted to the Multi-channel Multi-party Meeting Transcription Challenge.
System comprises speech enhancement, overlapped speech detection, speaker embedding extraction, speaker clustering, speech separation and system fusion.
arXiv Detail & Related papers (2022-02-10T03:35:05Z) - The Volcspeech system for the ICASSP 2022 multi-channel multi-party
meeting transcription challenge [18.33054364289739]
This paper describes our submission to ICASSP 2022 Multi-channel Multi-party Meeting Transcription (M2MeT) Challenge.
For Track 1, we propose several approaches to empower the clustering-based speaker diarization system.
For Track 2, we develop our system using the Conformer model in a joint CTC-attention architecture.
arXiv Detail & Related papers (2022-02-09T03:38:39Z) - USTC-NELSLIP System Description for DIHARD-III Challenge [78.40959509760488]
The innovation of our system lies in the combination of various front-end techniques to solve the diarization problem.
Our best system achieved DERs of 11.30% in track 1 and 16.78% in track 2 on evaluation set.
arXiv Detail & Related papers (2021-03-19T07:00:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.