LEAP System for SRE19 CTS Challenge -- Improvements and Error Analysis
- URL: http://arxiv.org/abs/2002.02735v2
- Date: Sun, 24 May 2020 05:28:06 GMT
- Title: LEAP System for SRE19 CTS Challenge -- Improvements and Error Analysis
- Authors: Shreyas Ramoji, Prashant Krishnan, Bhargavram Mysore, Prachi Singh,
Sriram Ganapathy
- Abstract summary: We provide a detailed account of the LEAP SRE system submitted to the CTS challenge.
All the systems used the time-delay neural network (TDNN) based x-vector embeddings.
The system combination of generative and neural PLDA models resulted in significant improvements for the SRE evaluation dataset.
- Score: 36.35711634925221
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The NIST Speaker Recognition Evaluation - Conversational Telephone Speech
(CTS) challenge 2019 was an open evaluation for the task of speaker
verification in challenging conditions. In this paper, we provide a detailed
account of the LEAP SRE system submitted to the CTS challenge focusing on the
novel components in the back-end system modeling. All the systems used the
time-delay neural network (TDNN) based x-vector embeddings. The x-vector system
in our SRE19 submission used a large pool of training speakers (about 14k
speakers). Following the x-vector extraction, we explored a neural network
approach to backend score computation that was optimized for a speaker
verification cost. The system combination of generative and neural PLDA models
resulted in significant improvements for the SRE evaluation dataset. We also
found additional gains for the SRE systems based on score normalization and
calibration. Subsequent to the evaluations, we have performed a detailed
analysis of the submitted systems. The analysis revealed the incremental gains
obtained for different training dataset combinations as well as the modeling
methods.
Related papers
- X-SepFormer: End-to-end Speaker Extraction Network with Explicit
Optimization on Speaker Confusion [5.4878772986187565]
We present an end-to-end TSE model with proposed loss schemes and a backbone of SepFormer.
With SI-SDRi of 19.4 dB and PESQ of 3.81, our best system significantly outperforms the current SOTA systems.
arXiv Detail & Related papers (2023-03-09T04:00:29Z) - STC speaker recognition systems for the NIST SRE 2021 [56.05258832139496]
This paper presents a description of STC Ltd. systems submitted to the NIST 2021 Speaker Recognition Evaluation.
These systems consists of a number of diverse subsystems based on using deep neural networks as feature extractors.
For video modality we developed our best solution with RetinaFace face detector and deep ResNet face embeddings extractor trained on large face image datasets.
arXiv Detail & Related papers (2021-11-03T15:31:01Z) - A Two-Stage Approach to Device-Robust Acoustic Scene Classification [63.98724740606457]
Two-stage system based on fully convolutional neural networks (CNNs) is proposed to improve device robustness.
Our results show that the proposed ASC system attains a state-of-the-art accuracy on the development set.
Neural saliency analysis with class activation mapping gives new insights on the patterns learnt by our models.
arXiv Detail & Related papers (2020-11-03T03:27:18Z) - The HUAWEI Speaker Diarisation System for the VoxCeleb Speaker
Diarisation Challenge [6.6238321827660345]
This paper describes system setup of our submission to speaker diarisation track (Track 4) of VoxCeleb Speaker Recognition Challenge 2020.
Our diarisation system consists of a well-trained neural network based speech enhancement model as pre-processing front-end of input speech signals.
arXiv Detail & Related papers (2020-10-22T12:42:07Z) - Formal Verification of Robustness and Resilience of Learning-Enabled State Estimation Systems [20.491263196235376]
We focus on learning-enabled state estimation systems (LE-SESs), which have been widely used in robotics applications.
We study LE-SESs from the perspective of formal verification, which determines the satisfiabilty of a system model.
arXiv Detail & Related papers (2020-10-16T11:06:50Z) - Neural PLDA Modeling for End-to-End Speaker Verification [40.842070706362534]
We propose a neural network approach for backend modeling in speaker verification called the neural PLDA (NPLDA)
In this paper, we extend this work to achieve joint optimization of the embedding neural network (x-vector network) with the NPLDA network in an end-to-end fashion.
We show that the proposed E2E model improves significantly over the x-vector PLDA baseline speaker verification system.
arXiv Detail & Related papers (2020-08-11T05:54:54Z) - Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks [61.76338096980383]
A range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper- parameters of state-of-the-art factored time delay neural networks (TDNNs)
These include the DARTS method integrating architecture selection with lattice-free MMI (LF-MMI) TDNN training.
Experiments conducted on a 300-hour Switchboard corpus suggest the auto-configured systems consistently outperform the baseline LF-MMI TDNN systems.
arXiv Detail & Related papers (2020-07-17T08:32:11Z) - AutoSpeech: Neural Architecture Search for Speaker Recognition [108.69505815793028]
We propose the first neural architecture search approach approach for the speaker recognition tasks, named as AutoSpeech.
Our algorithm first identifies the optimal operation combination in a neural cell and then derives a CNN model by stacking the neural cell for multiple times.
Results demonstrate that the derived CNN architectures significantly outperform current speaker recognition systems based on VGG-M, ResNet-18, and ResNet-34 back-bones, while enjoying lower model complexity.
arXiv Detail & Related papers (2020-05-07T02:53:47Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.