Probabilistic embeddings for speaker diarization
- URL: http://arxiv.org/abs/2004.04096v3
- Date: Fri, 6 Nov 2020 06:16:16 GMT
- Title: Probabilistic embeddings for speaker diarization
- Authors: Anna Silnova, Niko Br\"ummer, Johan Rohdin, Themos Stafylakis,
Luk\'a\v{s} Burget
- Abstract summary: Speaker embeddings (x-vectors) extracted from very short segments of speech have recently been shown to give competitive performance in speaker diarization.
We generalize this recipe by extracting from each speech segment, in parallel with the x-vector, also a diagonal precision matrix.
These precisions quantify the uncertainty about what the values of the embeddings might have been if they had been extracted from high quality speech segments.
- Score: 13.276960253126656
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speaker embeddings (x-vectors) extracted from very short segments of speech
have recently been shown to give competitive performance in speaker
diarization. We generalize this recipe by extracting from each speech segment,
in parallel with the x-vector, also a diagonal precision matrix, thus providing
a path for the propagation of information about the quality of the speech
segment into a PLDA scoring backend. These precisions quantify the uncertainty
about what the values of the embeddings might have been if they had been
extracted from high quality speech segments. The proposed probabilistic
embeddings (x-vectors with precisions) are interfaced with the PLDA model by
treating the x-vectors as hidden variables and marginalizing them out. We apply
the proposed probabilistic embeddings as input to an agglomerative hierarchical
clustering (AHC) algorithm to do diarization in the DIHARD'19 evaluation set.
We compute the full PLDA likelihood 'by the book' for each clustering
hypothesis that is considered by AHC. We do joint discriminative training of
the PLDA parameters and of the probabilistic x-vector extractor. We demonstrate
accuracy gains relative to a baseline AHC algorithm, applied to traditional
xvectors (without uncertainty), and which uses averaging of binary
log-likelihood-ratios, rather than by-the-book scoring.
Related papers
- Semiparametric conformal prediction [79.6147286161434]
Risk-sensitive applications require well-calibrated prediction sets over multiple, potentially correlated target variables.
We treat the scores as random vectors and aim to construct the prediction set accounting for their joint correlation structure.
We report desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z) - Quantization of Large Language Models with an Overdetermined Basis [73.79368761182998]
We introduce an algorithm for data quantization based on the principles of Kashin representation.
Our findings demonstrate that Kashin Quantization achieves competitive or superior quality in model performance.
arXiv Detail & Related papers (2024-04-15T12:38:46Z) - DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification [55.306583814017046]
We present a novel difficulty-aware semantic augmentation (DASA) approach for speaker verification.
DASA generates diversified training samples in speaker embedding space with negligible extra computing cost.
The best result achieves a 14.6% relative reduction in EER metric on CN-Celeb evaluation set.
arXiv Detail & Related papers (2023-10-18T17:07:05Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Self-Certifying Classification by Linearized Deep Assignment [65.0100925582087]
We propose a novel class of deep predictors for classifying metric data on graphs within PAC-Bayes risk certification paradigm.
Building on the recent PAC-Bayes literature and data-dependent priors, this approach enables learning posterior distributions on the hypothesis space.
arXiv Detail & Related papers (2022-01-26T19:59:14Z) - Diarisation using location tracking with agglomerative clustering [42.13772744221499]
This paper explicitly models the movements of speakers within an Agglomerative Hierarchical Clustering (AHC) diarisation framework.
Experiments show that the proposed approach is able to yield improvements on a Microsoft rich meeting transcription task.
arXiv Detail & Related papers (2021-09-22T08:54:10Z) - Kernel Density Estimation by Stagewise Algorithm with a Simple
Dictionary [0.0]
This paper studies kernel density estimation by stagewise algorithm with a simple dictionary on U-divergence.
We randomly split an i.i.d. sample into two disjoint sets, one for constructing the kernels in the dictionary and the other for evaluating the estimator.
arXiv Detail & Related papers (2021-07-27T17:05:06Z) - Visualizing Classifier Adjacency Relations: A Case Study in Speaker
Verification and Voice Anti-Spoofing [72.4445825335561]
We propose a simple method to derive 2D representation from detection scores produced by an arbitrary set of binary classifiers.
Based upon rank correlations, our method facilitates a visual comparison of classifiers with arbitrary scores.
While the approach is fully versatile and can be applied to any detection task, we demonstrate the method using scores produced by automatic speaker verification and voice anti-spoofing systems.
arXiv Detail & Related papers (2021-06-11T13:03:33Z) - MaP: A Matrix-based Prediction Approach to Improve Span Extraction in
Machine Reading Comprehension [40.22845723686718]
We propose a novel approach that extends the probability vector to a probability matrix.
To each possible start index, the method always generates an end probability vector.
We evaluate our method on SQuAD 1.1 and three other question answering benchmarks.
arXiv Detail & Related papers (2020-09-29T23:53:50Z) - Pairwise Discriminative Neural PLDA for Speaker Verification [41.76303371621405]
We propose a Pairwise neural discriminative model for the task of speaker verification.
We construct a differentiable cost function which approximates speaker verification loss.
Experiments are performed on the NIST SRE 2018 development and evaluation datasets.
arXiv Detail & Related papers (2020-01-20T09:52:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.