Tongji University Undergraduate Team for the VoxCeleb Speaker
Recognition Challenge2020
- URL: http://arxiv.org/abs/2010.10145v1
- Date: Tue, 20 Oct 2020 09:25:40 GMT
- Title: Tongji University Undergraduate Team for the VoxCeleb Speaker
Recognition Challenge2020
- Authors: Shufan Shen, Ran Miao, Yi Wang, Zhihua Wei
- Abstract summary: We applied the RSBU-CW module to the ResNet34 framework to improve the denoising ability of the network.
We trained two variants of ResNet,used score fusion and data-augmentation methods to improve the performance of the model.
- Score: 10.836635938778684
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this report, we discribe the submission of Tongji University undergraduate
team to the CLOSE track of the VoxCeleb Speaker Recognition Challenge (VoxSRC)
2020 at Interspeech 2020. We applied the RSBU-CW module to the ResNet34
framework to improve the denoising ability of the network and better complete
the speaker verification task in a complex environment.We trained two variants
of ResNet,used score fusion and data-augmentation methods to improve the
performance of the model. Our fusion of two selected systems for the CLOSE
track achieves 0.2973 DCF and 4.9700\% EER on the challenge evaluation set.
Related papers
- MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition [62.89464258519723]
We propose a multi-layer cross-attention fusion based AVSR approach that promotes representation of each modality by fusing them at different levels of audio/visual encoders.
Our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset.
arXiv Detail & Related papers (2024-01-07T08:59:32Z) - The GUA-Speech System Description for CNVSRC Challenge 2023 [8.5257557043542]
This study describes our system for Task 1 Single-speaker Visual Speech Recognition (VSR) fixed track in the Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023.
We use intermediate connectionist temporal classification (Inter CTC) residual modules to relax the conditional independence assumption of CTC in our model.
We also use a bi-transformer decoder to enable the model to capture both past and future contextual information.
arXiv Detail & Related papers (2023-12-12T13:35:33Z) - Continual Learning for On-Device Speech Recognition using Disentangled
Conformers [54.32320258055716]
We introduce a continual learning benchmark for speaker-specific domain adaptation derived from LibriVox audiobooks.
We propose a novel compute-efficient continual learning algorithm called DisentangledCL.
Our experiments show that the DisConformer models significantly outperform baselines on general ASR.
arXiv Detail & Related papers (2022-12-02T18:58:51Z) - THUEE system description for NIST 2020 SRE CTS challenge [19.2916501364633]
This paper presents the system description of the THUEE team for the NIST 2020 Speaker Recognition Evaluation (SRE) conversational telephone speech (CTS) challenge.
The subsystems including ResNet74, ResNet152, and RepVGG-B2 are developed as speaker embedding extractors in this evaluation.
arXiv Detail & Related papers (2022-10-12T12:01:59Z) - UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at
ActivityNet Challenge 2022 [69.67841335302576]
This report presents a brief description of our winning solution to the AVA Active Speaker Detection (ASD) task at ActivityNet Challenge 2022.
Our underlying model UniCon+ continues to build on our previous work, the Unified Context Network (UniCon) and Extended UniCon.
We augment the architecture with a simple GRU-based module that allows information of recurring identities to flow across scenes.
arXiv Detail & Related papers (2022-06-22T06:11:07Z) - The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party
meeting transcription (M2MeT) challenge [43.262531688434215]
We propose two improvements to target-speaker voice activity detection (TS-VAD)
These techniques are designed to handle multi-speaker conversations in real-world meeting scenarios with high speaker-overlap ratios and under heavy reverberant and noisy condition.
arXiv Detail & Related papers (2022-02-10T06:06:48Z) - STC speaker recognition systems for the NIST SRE 2021 [56.05258832139496]
This paper presents a description of STC Ltd. systems submitted to the NIST 2021 Speaker Recognition Evaluation.
These systems consists of a number of diverse subsystems based on using deep neural networks as feature extractors.
For video modality we developed our best solution with RetinaFace face detector and deep ResNet face embeddings extractor trained on large face image datasets.
arXiv Detail & Related papers (2021-11-03T15:31:01Z) - Two-Stream Consensus Network: Submission to HACS Challenge 2021
Weakly-Supervised Learning Track [78.64815984927425]
The goal of weakly-supervised temporal action localization is to temporally locate and classify action of interest in untrimmed videos.
We adopt the two-stream consensus network (TSCN) as the main framework in this challenge.
Our solution ranked 2rd in this challenge, and we hope our method can serve as a baseline for future academic research.
arXiv Detail & Related papers (2021-06-21T03:36:36Z) - Query Expansion System for the VoxCeleb Speaker Recognition Challenge
2020 [9.908371711364717]
We describe our submission to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020.
One is to apply query expansion on speaker verification, which shows significant progress compared to baseline in the study.
Another is to combine its Probabilistic Linear Discriminant Analysis (PLDA) score with ResNet score.
arXiv Detail & Related papers (2020-11-04T05:24:18Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.