Related papers: Speaker Diarization using Two-pass Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings

Speaker Diarization using Two-pass Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings

URL: http://arxiv.org/abs/2104.02469v2
Date: Wed, 7 Apr 2021 01:39:17 GMT
Title: Speaker Diarization using Two-pass Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings
Authors: Kiran Karra, Alan McCree
Abstract summary: This paper presents a new two-pass version of a system for speaker diarization using clustering and embeddings. For the Callhome corpus, we achieve the first published error rate below 4% without any task-dependent parameter tuning. We also show significant progress towards a robust single solution for multiple diarization tasks.
Score: 9.826793576487736
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many modern systems for speaker diarization, such as the recently-developed VBx approach, rely on clustering of DNN speaker embeddings followed by resegmentation. Two problems with this approach are that the DNN is not directly optimized for this task, and the parameters need significant retuning for different applications. We have recently presented progress in this direction with a Leave-One-Out Gaussian PLDA (LGP) clustering algorithm and an approach to training the DNN such that embeddings directly optimize performance of this scoring method. This paper presents a new two-pass version of this system, where the second pass uses finer time resolution to significantly improve overall performance. For the Callhome corpus, we achieve the first published error rate below 4\% without any task-dependent parameter tuning. We also show significant progress towards a robust single solution for multiple diarization tasks.

Related papers

Joint Transmit and Pinching Beamforming for Pinching Antenna Systems (PASS): Optimization-Based or Learning-Based? [89.05848771674773]
A novel antenna system ()-enabled downlink multi-user multiple-input single-output (MISO) framework is proposed. It consists of multiple waveguides, which equip numerous low-cost antennas, named (PAs) The positions of PAs can be reconfigured to both spanning large-scale path and space.
arXiv Detail & Related papers (2025-02-12T18:54:10Z)
An Efficient Diffusion-based Non-Autoregressive Solver for Traveling Salesman Problem [21.948190231334088]
We propose DEITSP, a diffusion model with efficient iterations tailored for Traveling Salesman Problems. We introduce a one-step diffusion model that integrates the controlled discrete noise addition process with self-consistency enhancement. We also design a dual-modality graph transformer to bolster the extraction and fusion of features from node and edge modalities.
arXiv Detail & Related papers (2025-01-23T15:47:04Z)
Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices. We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling. Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z)
Supervised Hierarchical Clustering using Graph Neural Networks for Speaker Diarization [41.30830281043803]
We propose a novel Supervised HierArchical gRaph Clustering algorithm (SHARC) for speaker diarization. In this paper, we introduce a hierarchical structure using Graph Neural Network (GNN) to perform supervised clustering. The supervised clustering is performed using node densities and edge existence probabilities to merge the segments until convergence.
arXiv Detail & Related papers (2023-02-24T16:16:41Z)
Score-Guided Intermediate Layer Optimization: Fast Langevin Mixing for Inverse Problem [97.64313409741614]
We prove fast mixing and characterize the stationary distribution of the Langevin Algorithm for inverting random weighted DNN generators. We propose to do posterior sampling in the latent space of a pre-trained generative model.
arXiv Detail & Related papers (2022-06-18T03:47:37Z)
Towards Lossless ANN-SNN Conversion under Ultra-Low Latency with Dual-Phase Optimization [30.098268054714048]
Spiking neural networks (SNNs) operating with asynchronous discrete events show higher energy efficiency with sparse computation. A popular approach for implementing deep SNNs is ANN-SNN conversion combining both efficient training of ANNs and efficient inference of SNNs. In this paper, we first identify that such performance degradation stems from the misrepresentation of the negative or overflow residual membrane potential in SNNs. Inspired by this, we decompose the conversion error into three parts: quantization error, clipping error, and residual membrane potential representation error.
arXiv Detail & Related papers (2022-05-16T06:53:14Z)
A neural network-supported two-stage algorithm for lightweight dereverberation on hearing devices [13.49645012479288]
A two-stage lightweight online dereverberation algorithm for hearing devices is presented in this paper. The approach combines a multi-channel multi-frame linear filter with a single-channel single-frame post-filter. Both components rely on power spectral density (PSD) estimates provided by deep neural networks (DNNs)
arXiv Detail & Related papers (2022-04-06T11:08:28Z)
Tight integration of neural- and clustering-based diarization through deep unfolding of infinite Gaussian mixture model [84.57667267657382]
This paper introduces a it trainable clustering algorithm into the integration framework. Speaker embeddings are optimized during training such that it better fits iGMM clustering. Experimental results show that the proposed approach outperforms the conventional approach in terms of diarization error rate.
arXiv Detail & Related papers (2022-02-14T07:45:21Z)
RoMA: Robust Model Adaptation for Offline Model-based Optimization [115.02677045518692]
We consider the problem of searching an input maximizing a black-box objective function given a static dataset of input-output queries. A popular approach to solving this problem is maintaining a proxy model that approximates the true objective function. Here, the main challenge is how to avoid adversarially optimized inputs during the search.
arXiv Detail & Related papers (2021-10-27T05:37:12Z)
Nesterov Accelerated ADMM for Fast Diffeomorphic Image Registration [63.15453821022452]
Recent developments in approaches based on deep learning have achieved sub-second runtimes for DiffIR. We propose a simple iterative scheme that functionally composes intermediate non-stationary velocity fields. We then propose a convex optimisation model that uses a regularisation term of arbitrary order to impose smoothness on these velocity fields.
arXiv Detail & Related papers (2021-09-26T19:56:45Z)
Neural Calibration for Scalable Beamforming in FDD Massive MIMO with Implicit Channel Estimation [10.775558382613077]
Channel estimation and beamforming play critical roles in frequency-division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems. We propose a deep learning-based approach that directly optimize the beamformers at the base station according to the received uplink pilots. A neural calibration method is proposed to improve the scalability of the end-to-end design.
arXiv Detail & Related papers (2021-08-03T14:26:14Z)
Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network. We design two optimal algorithms that attain these lower bounds. We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z)
Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic [5.395127324484869]
SplitSGD is a new dynamic learning schedule for optimization. The method decreases the learning rate for better adaptation to the local geometry of the objective function. It essentially does not incur additional computational cost than standard SGD.
arXiv Detail & Related papers (2019-10-18T19:38:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.