Towards Neural Diarization for Unlimited Numbers of Speakers Using
Global and Local Attractors
- URL: http://arxiv.org/abs/2107.01545v1
- Date: Sun, 4 Jul 2021 05:34:21 GMT
- Title: Towards Neural Diarization for Unlimited Numbers of Speakers Using
Global and Local Attractors
- Authors: Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yawen Xue, Yuki
Takashima, Yohei Kawaguchi
- Abstract summary: We introduce an unsupervised clustering process embedded in the attractor-based end-to-end diarization.
Our method achieved 11.84 %, 28.33 %, and 19.49 % on the CALLHOME, DIHARD II, and DIHARD III datasets.
- Score: 51.01295414889487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attractor-based end-to-end diarization is achieving comparable accuracy to
the carefully tuned conventional clustering-based methods on challenging
datasets. However, the main drawback is that it cannot deal with the case where
the number of speakers is larger than the one observed during training. This is
because its speaker counting relies on supervised learning. In this work, we
introduce an unsupervised clustering process embedded in the attractor-based
end-to-end diarization. We first split a sequence of frame-wise embeddings into
short subsequences and then perform attractor-based diarization for each
subsequence. Given subsequence-wise diarization results, inter-subsequence
speaker correspondence is obtained by unsupervised clustering of the vectors
computed from the attractors from all the subsequences. This makes it possible
to produce diarization results of a large number of speakers for the whole
recording even if the number of output speakers for each subsequence is
limited. Experimental results showed that our method could produce accurate
diarization results of an unseen number of speakers. Our method achieved 11.84
%, 28.33 %, and 19.49 % on the CALLHOME, DIHARD II, and DIHARD III datasets,
respectively, each of which is better than the conventional end-to-end
diarization methods.
Related papers
- Overlap-aware End-to-End Supervised Hierarchical Graph Clustering for
Speaker Diarization [41.24045486520547]
We propose an end-to-end supervised hierarchical clustering algorithm based on graph neural networks (GNN)
The proposed E-SHARC framework improves significantly over the state-of-art diarization systems.
arXiv Detail & Related papers (2024-01-23T15:35:44Z) - Low-complexity deep learning frameworks for acoustic scene
classification [64.22762153453175]
We present low-complexity deep learning frameworks for acoustic scene classification (ASC)
The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities.
Our experiments conducted on DCASE 2022 Task 1 Development dataset have fullfiled the requirement of low-complexity and achieved the best classification accuracy of 60.1%.
arXiv Detail & Related papers (2022-06-13T11:41:39Z) - Reformulating Speaker Diarization as Community Detection With Emphasis
On Topological Structure [10.508187462682308]
Clustering-based speaker diarization has stood firm as one of the major approaches in reality.
We propose to view clustering-based diarization as a community detection problem.
arXiv Detail & Related papers (2022-04-26T07:18:05Z) - Coarse-to-Fine Recursive Speech Separation for Unknown Number of
Speakers [8.380514397417457]
This paper formulates the speech separation with the unknown number of speakers as a multi-pass source extraction problem.
Experiments show that the proposed method archived state-of-the-art performance on the WSJ0 dataset with a different number of speakers.
arXiv Detail & Related papers (2022-03-30T04:45:34Z) - Tight integration of neural- and clustering-based diarization through
deep unfolding of infinite Gaussian mixture model [84.57667267657382]
This paper introduces a it trainable clustering algorithm into the integration framework.
Speaker embeddings are optimized during training such that it better fits iGMM clustering.
Experimental results show that the proposed approach outperforms the conventional approach in terms of diarization error rate.
arXiv Detail & Related papers (2022-02-14T07:45:21Z) - End-to-End Speaker Diarization as Post-Processing [64.12519350944572]
Clustering-based diarization methods partition frames into clusters of the number of speakers.
Some end-to-end diarization methods can handle overlapping speech by treating the problem as multi-label classification.
We propose to use a two-speaker end-to-end diarization method as post-processing of the results obtained by a clustering-based method.
arXiv Detail & Related papers (2020-12-18T05:31:07Z) - Integrating end-to-end neural and clustering-based diarization: Getting
the best of both worlds [71.36164750147827]
Clustering-based approaches assign speaker labels to speech regions by clustering speaker embeddings such as x-vectors.
End-to-end neural diarization (EEND) directly predicts diarization labels using a neural network.
We propose a simple but effective hybrid diarization framework that works with overlapped speech and for long recordings containing an arbitrary number of speakers.
arXiv Detail & Related papers (2020-10-26T06:33:02Z) - Effects of Word-frequency based Pre- and Post- Processings for Audio
Captioning [49.41766997393417]
The system we used for Task 6 (Automated Audio Captioning)of the Detection and Classification of Acoustic Scenes and Events(DCASE) 2020 Challenge combines three elements, namely, dataaugmentation, multi-task learning, and post-processing, for audiocaptioning.
The system received the highest evaluation scores, but which of the individual elements most fully contributed to its perfor-mance has not yet been clarified.
arXiv Detail & Related papers (2020-09-24T01:07:33Z) - Neural Speaker Diarization with Speaker-Wise Chain Rule [45.60980782843576]
We propose a speaker-wise conditional inference method for speaker diarization.
We show that the proposed method can correctly produce diarization results with a variable number of speakers.
arXiv Detail & Related papers (2020-06-02T17:28:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.