The NIST CTS Speaker Recognition Challenge
- URL: http://arxiv.org/abs/2204.10228v1
- Date: Thu, 21 Apr 2022 16:06:27 GMT
- Title: The NIST CTS Speaker Recognition Challenge
- Authors: Seyed Omid Sadjadi, Craig Greenberg, Elliot Singer, Lisa Mason,
Douglas Reynolds
- Abstract summary: The US National Institute of Standards and Technology (NIST) has been conducting a second iteration of the CTS Challenge since August 2020.
This paper presents an overview of the evaluation and several analyses of system performance for some primary conditions in the CTS Challenge.
- Score: 1.5282767384702267
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The US National Institute of Standards and Technology (NIST) has been
conducting a second iteration of the CTS challenge since August 2020. The
current iteration of the CTS Challenge is a leaderboard-style speaker
recognition evaluation using telephony data extracted from the unexposed
portions of the Call My Net 2 (CMN2) and Multi-Language Speech (MLS) corpora
collected by the LDC. The CTS Challenge is currently organized in a similar
manner to the SRE19 CTS Challenge, offering only an open training condition
using two evaluation subsets, namely Progress and Test. Unlike in the SRE19
Challenge, no training or development set was initially released, and NIST has
publicly released the leaderboards on both subsets for the CTS Challenge. Which
subset (i.e., Progress or Test) a trial belongs to is unknown to challenge
participants, and each system submission needs to contain outputs for all of
the trials. The CTS Challenge has also served, and will continue to do so, as a
prerequisite for entrance to the regular SREs (such as SRE21). Since August
2020, a total of 53 organizations (forming 33 teams) from academia and industry
have participated in the CTS Challenge and submitted more than 4400 valid
system outputs. This paper presents an overview of the evaluation and several
analyses of system performance for some primary conditions in the CTS
Challenge. The CTS Challenge results thus far indicate remarkable improvements
in performance due to 1) speaker embeddings extracted using large-scale and
complex neural network architectures such as ResNets along with angular margin
losses for speaker embedding extraction, 2) extensive data augmentation, 3) the
use of large amounts of in-house proprietary data from a large number of
labeled speakers, 4) long-duration fine-tuning.
Related papers
- The VoxCeleb Speaker Recognition Challenge: A Retrospective [75.40776645175585]
The VoxCeleb Speaker Recognition Challenges (VoxSRC) were a series of challenges and workshops that ran annually from 2019 to 2023.
The challenges primarily evaluated the tasks of speaker recognition and diarisation under various settings.
We provide a review of these challenges that covers: what they explored; the methods developed by the challenge participants and how these evolved.
arXiv Detail & Related papers (2024-08-27T08:57:31Z) - Overview of AI-Debater 2023: The Challenges of Argument Generation Tasks [62.443665295250035]
We present the results of the AI-Debater 2023 Challenge held by the Chinese Conference on Affect Computing (CCAC 2023)
In total, 32 competing teams register for the challenge, from which we received 11 successful submissions.
arXiv Detail & Related papers (2024-07-20T10:13:54Z) - ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech
Recognition Challenge [94.13624830833314]
This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle.
First-place team USTCiflytek achieves a CER of 13.16% in the ASR track and a cpCER of 21.48% in the ASDR track.
arXiv Detail & Related papers (2024-01-07T12:51:42Z) - QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via
Vector-Quantized Self-Supervised Speech Representation Learning [65.35080911787882]
This paper proposes a novel semi-supervised TTS framework, QS-TTS, to improve TTS quality with lower supervised data requirements.
Two VQ-S3R learners provide profitable speech representations and pre-trained models for TTS.
The results powerfully demonstrate the superior performance of QS-TTS, winning the highest MOS over supervised or semi-supervised baseline TTS approaches.
arXiv Detail & Related papers (2023-08-31T20:25:44Z) - VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge [95.6159736804855]
The VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22) was held in conjunction with INTERSPEECH 2022.
The goal of this challenge was to evaluate how well state-of-the-art speaker recognition systems can diarise and recognise speakers from speech obtained "in the wild"
arXiv Detail & Related papers (2023-02-20T19:27:14Z) - THUEE system description for NIST 2020 SRE CTS challenge [19.2916501364633]
This paper presents the system description of the THUEE team for the NIST 2020 Speaker Recognition Evaluation (SRE) conversational telephone speech (CTS) challenge.
The subsystems including ResNet74, ResNet152, and RepVGG-B2 are developed as speaker embedding extractors in this evaluation.
arXiv Detail & Related papers (2022-10-12T12:01:59Z) - Summary on the ISCSLP 2022 Chinese-English Code-Switching ASR Challenge [25.69349931845173]
The ISCSLP 2022 CSASR challenge provided two training sets, TAL_CSASR corpus and MagicData-RAMC corpus, a development and a test set for participants.
More than 40 teams participated in this challenge, and the winner team achieved 16.70% Mixture Error Rate (MER) performance on the test set.
In this paper, we will describe the datasets, the associated baselines system and the requirements, and summarize the CSASR challenge results and major techniques and tricks used in the submitted systems.
arXiv Detail & Related papers (2022-10-12T11:05:13Z) - The 2021 NIST Speaker Recognition Evaluation [1.5282767384702267]
The 2021 Speaker Recognition Evaluation (SRE21) was the latest cycle of the ongoing evaluation series conducted by the U.S. National Institute of Standards and Technology (NIST) since 1996.
This paper presents an overview of SRE21 including the tasks, performance metric, data, evaluation protocol, results and system performance analyses.
arXiv Detail & Related papers (2022-04-21T16:18:52Z) - NIST SRE CTS Superset: A large-scale dataset for telephony speaker
recognition [2.5403247066589074]
This document provides a brief description of the National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) conversational telephone speech (CTS) Superset.
The CTS Superset has been created in an attempt to provide the research community with a large-scale dataset.
It contains a large number of telephony speech segments from more than 6800 speakers with speech durations distributed uniformly in the [10s, 60s] range.
arXiv Detail & Related papers (2021-08-16T14:39:23Z) - Two-Stream Consensus Network: Submission to HACS Challenge 2021
Weakly-Supervised Learning Track [78.64815984927425]
The goal of weakly-supervised temporal action localization is to temporally locate and classify action of interest in untrimmed videos.
We adopt the two-stream consensus network (TSCN) as the main framework in this challenge.
Our solution ranked 2rd in this challenge, and we hope our method can serve as a baseline for future academic research.
arXiv Detail & Related papers (2021-06-21T03:36:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.