Pairwise Discriminative Neural PLDA for Speaker Verification
- URL: http://arxiv.org/abs/2001.07034v2
- Date: Fri, 7 Feb 2020 09:32:10 GMT
- Title: Pairwise Discriminative Neural PLDA for Speaker Verification
- Authors: Shreyas Ramoji, Prashant Krishnan V, Prachi Singh, Sriram Ganapathy
- Abstract summary: We propose a Pairwise neural discriminative model for the task of speaker verification.
We construct a differentiable cost function which approximates speaker verification loss.
Experiments are performed on the NIST SRE 2018 development and evaluation datasets.
- Score: 41.76303371621405
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The state-of-art approach to speaker verification involves the extraction of
discriminative embeddings like x-vectors followed by a generative model
back-end using a probabilistic linear discriminant analysis (PLDA). In this
paper, we propose a Pairwise neural discriminative model for the task of
speaker verification which operates on a pair of speaker embeddings such as
x-vectors/i-vectors and outputs a score that can be considered as a scaled
log-likelihood ratio. We construct a differentiable cost function which
approximates speaker verification loss, namely the minimum detection cost. The
pre-processing steps of linear discriminant analysis (LDA), unit length
normalization and within class covariance normalization are all modeled as
layers of a neural model and the speaker verification cost functions can be
back-propagated through these layers during training. We also explore
regularization techniques to prevent overfitting, which is a major concern in
using discriminative back-end models for verification tasks. The experiments
are performed on the NIST SRE 2018 development and evaluation datasets. We
observe average relative improvements of 8% in CMN2 condition and 30% in VAST
condition over the PLDA baseline system.
Related papers
- HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Explaining Cross-Domain Recognition with Interpretable Deep Classifier [100.63114424262234]
Interpretable Deep (IDC) learns the nearest source samples of a target sample as evidence upon which the classifier makes the decision.
Our IDC leads to a more explainable model with almost no accuracy degradation and effectively calibrates classification for optimum reject options.
arXiv Detail & Related papers (2022-11-15T15:58:56Z) - Self-supervised Speaker Diarization [19.111219197011355]
This study proposes an entirely unsupervised deep-learning model for speaker diarization.
Speaker embeddings are represented by an encoder trained in a self-supervised fashion using pairs of adjacent segments assumed to be of the same speaker.
arXiv Detail & Related papers (2022-04-08T16:27:14Z) - Visualizing Classifier Adjacency Relations: A Case Study in Speaker
Verification and Voice Anti-Spoofing [72.4445825335561]
We propose a simple method to derive 2D representation from detection scores produced by an arbitrary set of binary classifiers.
Based upon rank correlations, our method facilitates a visual comparison of classifiers with arbitrary scores.
While the approach is fully versatile and can be applied to any detection task, we demonstrate the method using scores produced by automatic speaker verification and voice anti-spoofing systems.
arXiv Detail & Related papers (2021-06-11T13:03:33Z) - A Speaker Verification Backend with Robust Performance across Conditions [28.64769660252556]
A standard method for speaker verification consists of extracting speaker embeddings with a deep neural network.
This method is known to result in systems that work poorly on conditions different from those used to train the calibration model.
We propose to modify the standard backend, introducing an adaptive calibrator that uses duration and other automatically extracted side-information to adapt to the conditions of the inputs.
arXiv Detail & Related papers (2021-02-02T21:27:52Z) - Open-set Short Utterance Forensic Speaker Verification using
Teacher-Student Network with Explicit Inductive Bias [59.788358876316295]
We propose a pipeline solution to improve speaker verification on a small actual forensic field dataset.
By leveraging large-scale out-of-domain datasets, a knowledge distillation based objective function is proposed for teacher-student learning.
We show that the proposed objective function can efficiently improve the performance of teacher-student learning on short utterances.
arXiv Detail & Related papers (2020-09-21T00:58:40Z) - Neural PLDA Modeling for End-to-End Speaker Verification [40.842070706362534]
We propose a neural network approach for backend modeling in speaker verification called the neural PLDA (NPLDA)
In this paper, we extend this work to achieve joint optimization of the embedding neural network (x-vector network) with the NPLDA network in an end-to-end fashion.
We show that the proposed E2E model improves significantly over the x-vector PLDA baseline speaker verification system.
arXiv Detail & Related papers (2020-08-11T05:54:54Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z) - NPLDA: A Deep Neural PLDA Model for Speaker Verification [40.842070706362534]
We propose a neural network approach for backend modeling in speaker recognition.
The proposed model, termed as neural PLDA (NPLDA), is optimized using the generative PLDA model parameters.
In experiments, the NPLDA model optimized using the proposed loss function improves significantly over the state-of-art PLDA based speaker verification system.
arXiv Detail & Related papers (2020-02-10T05:47:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.