The JHU submission to VoxSRC-21: Track 3
- URL: http://arxiv.org/abs/2109.13425v1
- Date: Tue, 28 Sep 2021 01:30:10 GMT
- Title: The JHU submission to VoxSRC-21: Track 3
- Authors: Jejin Cho, Jesus Villalba, Najim Dehak
- Abstract summary: This report describes Johns Hopkins University speaker recognition system submitted to Voxceleb Speaker Recognition Challenge 2021 Track 3: Self-supervised speaker verification (closed)
Our overall training process is similar to the proposed one from the last year's VoxSRC 2020 challenge.
This is our best submitted model to the challenge, showing 1.89, 6.50, and 6.89 in EER(%) in voxceleb1 test o, VoxSRC-21 validation, and test trials, respectively.
- Score: 31.804401484416452
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This technical report describes Johns Hopkins University speaker recognition
system submitted to Voxceleb Speaker Recognition Challenge 2021 Track 3:
Self-supervised speaker verification (closed). Our overall training process is
similar to the proposed one from the first place team in the last year's
VoxSRC2020 challenge. The main difference is a recently proposed
non-contrastive self-supervised method in computer vision (CV), distillation
with no labels (DINO), is used to train our initial model, which outperformed
the last year's contrastive learning based on momentum contrast (MoCo). Also,
this requires only a few iterations in the iterative clustering stage, where
pseudo labels for supervised embedding learning are updated based on the
clusters of the embeddings generated from a model that is continually
fine-tuned over iterations. In the final stage, Res2Net50 is trained on the
final pseudo labels from the iterative clustering stage. This is our best
submitted model to the challenge, showing 1.89, 6.50, and 6.89 in EER(%) in
voxceleb1 test o, VoxSRC-21 validation, and test trials, respectively.
Related papers
- 1st Place Solution for ECCV 2022 OOD-CV Challenge Image Classification
Track [64.49153847504141]
OOD-CV challenge is an out-of-distribution generalization task.
In this challenge, our core solution can be summarized as that Noisy Label Learning Is A Strong Test-Time Domain Adaptation method.
After integrating Test-Time Augmentation and Model Ensemble strategies, our solution ranks the first place on the Image Classification Leaderboard of the OOD-CV Challenge.
arXiv Detail & Related papers (2023-01-12T03:44:30Z) - The SpeakIn Speaker Verification System for Far-Field Speaker
Verification Challenge 2022 [15.453882034529913]
This paper describes speaker verification systems submitted to the Far-Field Speaker Verification Challenge 2022 (FFSVC2022)
The ResNet-based and RepVGG-based architectures were developed for this challenge.
Our approach leads to excellent performance and ranks 1st in both challenge tasks.
arXiv Detail & Related papers (2022-09-23T14:51:55Z) - The ReturnZero System for VoxCeleb Speaker Recognition Challenge 2022 [0.0]
We describe the top-scoring submissions for team RTZR VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22)
The top performed system is a fusion of 7 models, which contains 3 different types of model architectures.
The final submission achieves 0.165 DCF and 2.912% EER on the VoxSRC22 test set.
arXiv Detail & Related papers (2022-09-21T06:54:24Z) - Supervision-Guided Codebooks for Masked Prediction in Speech
Pre-training [102.14558233502514]
Masked prediction pre-training has seen remarkable progress in self-supervised learning (SSL) for speech recognition.
We propose two supervision-guided codebook generation approaches to improve automatic speech recognition (ASR) performance.
arXiv Detail & Related papers (2022-06-21T06:08:30Z) - Raw waveform speaker verification for supervised and self-supervised
learning [30.08242210230669]
This paper proposes a new raw waveform speaker verification model that incorporates techniques proven effective for speaker verification.
Under the best performing configuration, the model shows an equal error rate of 0.89%, competitive with state-of-the-art models.
We also explore the proposed model with a self-supervised learning framework and show the state-of-the-art performance in this line of research.
arXiv Detail & Related papers (2022-03-16T09:28:03Z) - The Phonexia VoxCeleb Speaker Recognition Challenge 2021 System
Description [1.3687617973585977]
We describe the Phonexia submission for the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21) in the unsupervised speaker verification track.
An embedding extractor was bootstrapped using momentum contrastive learning, with input augmentations as the only source of supervision.
A score fusion was done, by averaging the zt-normalized cosine scores of five different embedding extractors.
arXiv Detail & Related papers (2021-09-05T12:10:26Z) - Two-Stream Consensus Network: Submission to HACS Challenge 2021
Weakly-Supervised Learning Track [78.64815984927425]
The goal of weakly-supervised temporal action localization is to temporally locate and classify action of interest in untrimmed videos.
We adopt the two-stream consensus network (TSCN) as the main framework in this challenge.
Our solution ranked 2rd in this challenge, and we hope our method can serve as a baseline for future academic research.
arXiv Detail & Related papers (2021-06-21T03:36:36Z) - HuBERT: Self-Supervised Speech Representation Learning by Masked
Prediction of Hidden Units [81.53783563025084]
We propose an offline clustering step to provide aligned target labels for a BERT-like prediction loss.
A key ingredient of our approach is applying the prediction loss over the masked regions only.
HuBERT shows up to 19% and 13% relative WER reduction on the more challenging dev-other and test-other evaluation subsets.
arXiv Detail & Related papers (2021-06-14T14:14:28Z) - Phonemer at WNUT-2020 Task 2: Sequence Classification Using COVID
Twitter BERT and Bagging Ensemble Technique based on Plurality Voting [0.0]
We develop a system that automatically identifies whether an English Tweet related to the novel coronavirus (COVID-19) is informative or not.
Our final approach achieved an F1-score of 0.9037 and we were ranked sixth overall with F1-score as the evaluation criteria.
arXiv Detail & Related papers (2020-10-01T10:54:54Z) - TACRED Revisited: A Thorough Evaluation of the TACRED Relation
Extraction Task [80.38130122127882]
TACRED is one of the largest, most widely used crowdsourced datasets in Relation Extraction (RE)
In this paper, we investigate the questions: Have we reached a performance ceiling or is there still room for improvement?
We find that label errors account for 8% absolute F1 test error, and that more than 50% of the examples need to be relabeled.
arXiv Detail & Related papers (2020-04-30T15:07:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.