ChinaTelecom System Description to VoxCeleb Speaker Recognition
Challenge 2023
- URL: http://arxiv.org/abs/2308.08181v1
- Date: Wed, 16 Aug 2023 07:21:01 GMT
- Title: ChinaTelecom System Description to VoxCeleb Speaker Recognition
Challenge 2023
- Authors: Mengjie Du and Xiang Fang and Jie Li
- Abstract summary: Our system consists of several ResNet variants trained only on VoxCeleb2, which were fused for better performance later.
The final submission achieved minDCF of 0.1066 and EER of 1.980%.
- Score: 7.764294108093176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This technical report describes ChinaTelecom system for Track 1 (closed) of
the VoxCeleb2023 Speaker Recognition Challenge (VoxSRC 2023). Our system
consists of several ResNet variants trained only on VoxCeleb2, which were fused
for better performance later. Score calibration was also applied for each
variant and the fused system. The final submission achieved minDCF of 0.1066
and EER of 1.980%.
Related papers
- The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in
CNVSRC 2023 [67.11294606070278]
This paper delineates the visual speech recognition (VSR) system introduced by the NPU-ASLP-LiAuto (Team 237) in the first Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023.
In terms of data processing, we leverage the lip motion extractor from the baseline1 to produce multi-scale video data.
Various augmentation techniques are applied during training, encompassing speed perturbation, random rotation, horizontal flipping, and color transformation.
arXiv Detail & Related papers (2024-01-07T14:20:52Z) - MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition [62.89464258519723]
We propose a multi-layer cross-attention fusion based AVSR approach that promotes representation of each modality by fusing them at different levels of audio/visual encoders.
Our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset.
arXiv Detail & Related papers (2024-01-07T08:59:32Z) - UNISOUND System for VoxCeleb Speaker Recognition Challenge 2023 [11.338256222745429]
This report describes the UNISOUND submission for Track1 and Track2 of VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC 2023)
We submit the same system on Track 1 and Track 2, which is trained with only VoxCeleb2-dev.
We propose a consistency-aware score calibration method, which leverages the stability of audio voiceprints in similarity score by a Consistency Measure Factor (CMF)
arXiv Detail & Related papers (2023-08-24T03:30:38Z) - VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge [95.6159736804855]
The VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22) was held in conjunction with INTERSPEECH 2022.
The goal of this challenge was to evaluate how well state-of-the-art speaker recognition systems can diarise and recognise speakers from speech obtained "in the wild"
arXiv Detail & Related papers (2023-02-20T19:27:14Z) - The Newsbridge -Telecom SudParis VoxCeleb Speaker Recognition Challenge
2022 System Description [0.0]
We describe the system used by our team for the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC 2022) in the speaker diarization track.
Our solution was designed around a new combination of voice activity detection algorithms that uses the strengths of several systems.
arXiv Detail & Related papers (2023-01-17T15:52:39Z) - THUEE system description for NIST 2020 SRE CTS challenge [19.2916501364633]
This paper presents the system description of the THUEE team for the NIST 2020 Speaker Recognition Evaluation (SRE) conversational telephone speech (CTS) challenge.
The subsystems including ResNet74, ResNet152, and RepVGG-B2 are developed as speaker embedding extractors in this evaluation.
arXiv Detail & Related papers (2022-10-12T12:01:59Z) - STC speaker recognition systems for the NIST SRE 2021 [56.05258832139496]
This paper presents a description of STC Ltd. systems submitted to the NIST 2021 Speaker Recognition Evaluation.
These systems consists of a number of diverse subsystems based on using deep neural networks as feature extractors.
For video modality we developed our best solution with RetinaFace face detector and deep ResNet face embeddings extractor trained on large face image datasets.
arXiv Detail & Related papers (2021-11-03T15:31:01Z) - The Phonexia VoxCeleb Speaker Recognition Challenge 2021 System
Description [1.3687617973585977]
We describe the Phonexia submission for the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21) in the unsupervised speaker verification track.
An embedding extractor was bootstrapped using momentum contrastive learning, with input augmentations as the only source of supervision.
A score fusion was done, by averaging the zt-normalized cosine scores of five different embedding extractors.
arXiv Detail & Related papers (2021-09-05T12:10:26Z) - USTC-NELSLIP System Description for DIHARD-III Challenge [78.40959509760488]
The innovation of our system lies in the combination of various front-end techniques to solve the diarization problem.
Our best system achieved DERs of 11.30% in track 1 and 16.78% in track 2 on evaluation set.
arXiv Detail & Related papers (2021-03-19T07:00:51Z) - The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural
Diarization and X-Vector Clustering Systems Combined by DOVER-Lap [67.395341302752]
This paper provides a detailed description of the Hitachi-JHU system that was submitted to the Third DIHARD Speech Diarization Challenge.
The system outputs the ensemble results of the five subsystems: two x-vector-based subsystems, two end-to-end neural diarization-based subsystems, and one hybrid subsystem.
arXiv Detail & Related papers (2021-02-02T07:30:44Z) - Tongji University Undergraduate Team for the VoxCeleb Speaker
Recognition Challenge2020 [10.836635938778684]
We applied the RSBU-CW module to the ResNet34 framework to improve the denoising ability of the network.
We trained two variants of ResNet,used score fusion and data-augmentation methods to improve the performance of the model.
arXiv Detail & Related papers (2020-10-20T09:25:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.