Speaker Diarization using Two-pass Leave-One-Out Gaussian PLDA
Clustering of DNN Embeddings
- URL: http://arxiv.org/abs/2104.02469v2
- Date: Wed, 7 Apr 2021 01:39:17 GMT
- Title: Speaker Diarization using Two-pass Leave-One-Out Gaussian PLDA
Clustering of DNN Embeddings
- Authors: Kiran Karra, Alan McCree
- Abstract summary: This paper presents a new two-pass version of a system for speaker diarization using clustering and embeddings.
For the Callhome corpus, we achieve the first published error rate below 4% without any task-dependent parameter tuning.
We also show significant progress towards a robust single solution for multiple diarization tasks.
- Score: 9.826793576487736
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many modern systems for speaker diarization, such as the recently-developed
VBx approach, rely on clustering of DNN speaker embeddings followed by
resegmentation. Two problems with this approach are that the DNN is not
directly optimized for this task, and the parameters need significant retuning
for different applications. We have recently presented progress in this
direction with a Leave-One-Out Gaussian PLDA (LGP) clustering algorithm and an
approach to training the DNN such that embeddings directly optimize performance
of this scoring method. This paper presents a new two-pass version of this
system, where the second pass uses finer time resolution to significantly
improve overall performance. For the Callhome corpus, we achieve the first
published error rate below 4\% without any task-dependent parameter tuning. We
also show significant progress towards a robust single solution for multiple
diarization tasks.
Related papers
- Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse
Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices.
We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling.
Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z) - Supervised Hierarchical Clustering using Graph Neural Networks for
Speaker Diarization [41.30830281043803]
We propose a novel Supervised HierArchical gRaph Clustering algorithm (SHARC) for speaker diarization.
In this paper, we introduce a hierarchical structure using Graph Neural Network (GNN) to perform supervised clustering.
The supervised clustering is performed using node densities and edge existence probabilities to merge the segments until convergence.
arXiv Detail & Related papers (2023-02-24T16:16:41Z) - Score-Guided Intermediate Layer Optimization: Fast Langevin Mixing for
Inverse Problem [97.64313409741614]
We prove fast mixing and characterize the stationary distribution of the Langevin Algorithm for inverting random weighted DNN generators.
We propose to do posterior sampling in the latent space of a pre-trained generative model.
arXiv Detail & Related papers (2022-06-18T03:47:37Z) - Towards Lossless ANN-SNN Conversion under Ultra-Low Latency with Dual-Phase Optimization [30.098268054714048]
Spiking neural networks (SNNs) operating with asynchronous discrete events show higher energy efficiency with sparse computation.
A popular approach for implementing deep SNNs is ANN-SNN conversion combining both efficient training of ANNs and efficient inference of SNNs.
In this paper, we first identify that such performance degradation stems from the misrepresentation of the negative or overflow residual membrane potential in SNNs.
Inspired by this, we decompose the conversion error into three parts: quantization error, clipping error, and residual membrane potential representation error.
arXiv Detail & Related papers (2022-05-16T06:53:14Z) - A neural network-supported two-stage algorithm for lightweight
dereverberation on hearing devices [13.49645012479288]
A two-stage lightweight online dereverberation algorithm for hearing devices is presented in this paper.
The approach combines a multi-channel multi-frame linear filter with a single-channel single-frame post-filter.
Both components rely on power spectral density (PSD) estimates provided by deep neural networks (DNNs)
arXiv Detail & Related papers (2022-04-06T11:08:28Z) - Tight integration of neural- and clustering-based diarization through
deep unfolding of infinite Gaussian mixture model [84.57667267657382]
This paper introduces a it trainable clustering algorithm into the integration framework.
Speaker embeddings are optimized during training such that it better fits iGMM clustering.
Experimental results show that the proposed approach outperforms the conventional approach in terms of diarization error rate.
arXiv Detail & Related papers (2022-02-14T07:45:21Z) - RoMA: Robust Model Adaptation for Offline Model-based Optimization [115.02677045518692]
We consider the problem of searching an input maximizing a black-box objective function given a static dataset of input-output queries.
A popular approach to solving this problem is maintaining a proxy model that approximates the true objective function.
Here, the main challenge is how to avoid adversarially optimized inputs during the search.
arXiv Detail & Related papers (2021-10-27T05:37:12Z) - Nesterov Accelerated ADMM for Fast Diffeomorphic Image Registration [63.15453821022452]
Recent developments in approaches based on deep learning have achieved sub-second runtimes for DiffIR.
We propose a simple iterative scheme that functionally composes intermediate non-stationary velocity fields.
We then propose a convex optimisation model that uses a regularisation term of arbitrary order to impose smoothness on these velocity fields.
arXiv Detail & Related papers (2021-09-26T19:56:45Z) - Neural Calibration for Scalable Beamforming in FDD Massive MIMO with
Implicit Channel Estimation [10.775558382613077]
Channel estimation and beamforming play critical roles in frequency-division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems.
We propose a deep learning-based approach that directly optimize the beamformers at the base station according to the received uplink pilots.
A neural calibration method is proposed to improve the scalability of the end-to-end design.
arXiv Detail & Related papers (2021-08-03T14:26:14Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Robust Learning Rate Selection for Stochastic Optimization via Splitting
Diagnostic [5.395127324484869]
SplitSGD is a new dynamic learning schedule for optimization.
The method decreases the learning rate for better adaptation to the local geometry of the objective function.
It essentially does not incur additional computational cost than standard SGD.
arXiv Detail & Related papers (2019-10-18T19:38:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.