Highly Efficient Real-Time Streaming and Fully On-Device Speaker
Diarization with Multi-Stage Clustering
- URL: http://arxiv.org/abs/2210.13690v4
- Date: Mon, 8 Jan 2024 17:05:51 GMT
- Title: Highly Efficient Real-Time Streaming and Fully On-Device Speaker
Diarization with Multi-Stage Clustering
- Authors: Quan Wang, Yiling Huang, Han Lu, Guanlong Zhao, Ignacio Lopez Moreno
- Abstract summary: A multi-stage clustering strategy that uses different clustering algorithms for input of different lengths can address multi-faceted challenges of speaker diarization applications.
This strategy is critical for streaming on-device speaker diarization systems, where the budgets of CPU, memory and battery are tight.
- Score: 18.62774420511154
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While recent research advances in speaker diarization mostly focus on
improving the quality of diarization results, there is also an increasing
interest in improving the efficiency of diarization systems. In this paper, we
demonstrate that a multi-stage clustering strategy that uses different
clustering algorithms for input of different lengths can address multi-faceted
challenges of on-device speaker diarization applications. Specifically, a
fallback clusterer is used to handle short-form inputs; a main clusterer is
used to handle medium-length inputs; and a pre-clusterer is used to compress
long-form inputs before they are processed by the main clusterer. Both the main
clusterer and the pre-clusterer can be configured with an upper bound of the
computational complexity to adapt to devices with different resource
constraints. This multi-stage clustering strategy is critical for streaming
on-device speaker diarization systems, where the budgets of CPU, memory and
battery are tight.
Related papers
- A3S: A General Active Clustering Method with Pairwise Constraints [66.74627463101837]
A3S features strategic active clustering adjustment on the initial cluster result, which is obtained by an adaptive clustering algorithm.
In extensive experiments across diverse real-world datasets, A3S achieves desired results with significantly fewer human queries.
arXiv Detail & Related papers (2024-07-14T13:37:03Z) - Overlap-aware End-to-End Supervised Hierarchical Graph Clustering for
Speaker Diarization [41.24045486520547]
We propose an end-to-end supervised hierarchical clustering algorithm based on graph neural networks (GNN)
The proposed E-SHARC framework improves significantly over the state-of-art diarization systems.
arXiv Detail & Related papers (2024-01-23T15:35:44Z) - Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining.
Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks.
Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z) - DeepCluE: Enhanced Image Clustering via Multi-layer Ensembles in Deep
Neural Networks [53.88811980967342]
This paper presents a Deep Clustering via Ensembles (DeepCluE) approach.
It bridges the gap between deep clustering and ensemble clustering by harnessing the power of multiple layers in deep neural networks.
Experimental results on six image datasets confirm the advantages of DeepCluE over the state-of-the-art deep clustering approaches.
arXiv Detail & Related papers (2022-06-01T09:51:38Z) - Meta Clustering Learning for Large-scale Unsupervised Person
Re-identification [124.54749810371986]
We propose a "small data for big task" paradigm dubbed Meta Clustering Learning (MCL)
MCL only pseudo-labels a subset of the entire unlabeled data via clustering to save computing for the first-phase training.
Our method significantly saves computational cost while achieving a comparable or even better performance compared to prior works.
arXiv Detail & Related papers (2021-11-19T04:10:18Z) - Fast and Interpretable Consensus Clustering via Minipatch Learning [0.0]
We develop IMPACC: Interpretable MiniPatch Adaptive Consensus Clustering.
We develop adaptive sampling schemes for observations, which result in both improved reliability and computational savings.
Results show that our approach yields more accurate and interpretable cluster solutions.
arXiv Detail & Related papers (2021-10-05T22:39:28Z) - Augmented Data as an Auxiliary Plug-in Towards Categorization of
Crowdsourced Heritage Data [2.609784101826762]
We propose a strategy to mitigate the problem of inefficient clustering performance by introducing data augmentation as an auxiliary plug-in.
We train a variant of Convolutional Autoencoder (CAE) with augmented data to construct the initial feature space as a novel model for deep clustering.
arXiv Detail & Related papers (2021-07-08T14:09:39Z) - Unsupervised Clustered Federated Learning in Complex Multi-source
Acoustic Environments [75.8001929811943]
We introduce a realistic and challenging, multi-source and multi-room acoustic environment.
We present an improved clustering control strategy that takes into account the variability of the acoustic scene.
The proposed approach is optimized using clustering-based measures and validated via a network-wide classification task.
arXiv Detail & Related papers (2021-06-07T14:51:39Z) - Cluster-Former: Clustering-based Sparse Transformer for Long-Range
Dependency Encoding [90.77031668988661]
Cluster-Former is a novel clustering-based sparse Transformer to perform attention across chunked sequences.
The proposed framework is pivoted on two unique types of Transformer layer: Sliding-Window Layer and Cluster-Former Layer.
Experiments show that Cluster-Former achieves state-of-the-art performance on several major QA benchmarks.
arXiv Detail & Related papers (2020-09-13T22:09:30Z) - A Robust Speaker Clustering Method Based on Discrete Tied Variational
Autoencoder [27.211505187332385]
Traditional speaker clustering method based on aggregation hierarchy cluster (AHC) has the shortcomings of long-time running and remains sensitive to environment noise.
We propose a novel speaker clustering method based on Mutual Information (MI) and a non-linear model with discrete variable, which under the enlightenment of Tied Variational Autoencoder (TVAE) to enhance the robustness against noise.
arXiv Detail & Related papers (2020-03-04T08:54:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.