The ReturnZero System for VoxCeleb Speaker Recognition Challenge 2022
- URL: http://arxiv.org/abs/2209.10147v1
- Date: Wed, 21 Sep 2022 06:54:24 GMT
- Title: The ReturnZero System for VoxCeleb Speaker Recognition Challenge 2022
- Authors: Sangwon Suh, Sunjong Park
- Abstract summary: We describe the top-scoring submissions for team RTZR VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22)
The top performed system is a fusion of 7 models, which contains 3 different types of model architectures.
The final submission achieves 0.165 DCF and 2.912% EER on the VoxSRC22 test set.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we describe the top-scoring submissions for team RTZR VoxCeleb
Speaker Recognition Challenge 2022 (VoxSRC-22) in the closed dataset, speaker
verification Track 1. The top performed system is a fusion of 7 models, which
contains 3 different types of model architectures. We focus on training models
to learn extra-temporal information. Therefore, all models were trained with
4-6 second frames for each utterance. Also, we apply the Large Margin
Fine-tuning strategy which has shown good performance on the previous
challenges for some of our fusion models. While the evaluation process, we
apply the scoring methods with adaptive symmetric normalization (AS-Norm) and
matrix score average (MSA). Finally, we mix up models with logistic regression
to fuse all the trained models. The final submission achieves 0.165 DCF and
2.912% EER on the VoxSRC22 test set.
Related papers
- EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - The SpeakIn Speaker Verification System for Far-Field Speaker
Verification Challenge 2022 [15.453882034529913]
This paper describes speaker verification systems submitted to the Far-Field Speaker Verification Challenge 2022 (FFSVC2022)
The ResNet-based and RepVGG-based architectures were developed for this challenge.
Our approach leads to excellent performance and ranks 1st in both challenge tasks.
arXiv Detail & Related papers (2022-09-23T14:51:55Z) - Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups.
We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective.
Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z) - RoBLEURT Submission for the WMT2021 Metrics Task [72.26898579202076]
We present our submission to the Shared Metrics Task: RoBLEURT.
Our model reaches state-of-the-art correlations with the WMT 2020 human annotations upon 8 out of 10 to-English language pairs.
arXiv Detail & Related papers (2022-04-28T08:49:40Z) - The RoyalFlush System of Speech Recognition for M2MeT Challenge [5.863625637354342]
This paper describes our RoyalFlush system for the track of multi-speaker automatic speech recognition (ASR) in the M2MeT challenge.
We adopted the serialized output training (SOT) based multi-speakers ASR system with large-scale simulation data.
Our system got a 12.22% absolute Character Error Rate (CER) reduction on the validation set and 12.11% on the test set.
arXiv Detail & Related papers (2022-02-03T14:38:26Z) - Challenges in Procedural Multimodal Machine Comprehension:A Novel Way To
Benchmark [14.50261153230204]
We focus on Multimodal Machine Reading (M3C) where a model is expected to answer questions based on given passage (or context)
We identify three critical biases stemming from the question-answer generation process and memorization capabilities of large deep models.
We propose a systematic framework to address these biases through three Control-Knobs.
arXiv Detail & Related papers (2021-10-22T16:33:57Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - A Coarse to Fine Question Answering System based on Reinforcement
Learning [48.80863342506432]
The system is designed using an actor-critic based deep reinforcement learning model to achieve multi-step question answering.
We test our model on four QA datasets, WIKEREADING, WIKIREADING LONG, CNN and SQuAD, and demonstrate 1.3$%$-1.7$%$ accuracy improvements with 1.5x-3.4x training speed-ups.
arXiv Detail & Related papers (2021-06-01T06:41:48Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - ERNIE at SemEval-2020 Task 10: Learning Word Emphasis Selection by
Pre-trained Language Model [18.41476971318978]
This paper describes the system designed by ERNIE Team which achieved the first place in SemEval-2020 Task 10: Emphasis Selection For Written Text in Visual Media.
We leverage the unsupervised pre-training model and finetune these models on our task.
Our best model achieves the highest score of 0.823 and ranks first for all kinds of metrics.
arXiv Detail & Related papers (2020-09-08T12:51:22Z) - Gestalt: a Stacking Ensemble for SQuAD2.0 [0.0]
We propose a deep-learning system that finds, or indicates the lack of, a correct answer to a question in a context paragraph.
Our goal is to learn an ensemble of heterogeneous SQuAD2.0 models that outperforms the best model in the ensemble per se.
arXiv Detail & Related papers (2020-04-02T08:09:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.