End-to-End Speaker Diarization as Post-Processing
- URL: http://arxiv.org/abs/2012.10055v2
- Date: Wed, 23 Dec 2020 15:56:02 GMT
- Title: End-to-End Speaker Diarization as Post-Processing
- Authors: Shota Horiguchi, Paola Garcia, Yusuke Fujita, Shinji Watanabe, Kenji
Nagamatsu
- Abstract summary: Clustering-based diarization methods partition frames into clusters of the number of speakers.
Some end-to-end diarization methods can handle overlapping speech by treating the problem as multi-label classification.
We propose to use a two-speaker end-to-end diarization method as post-processing of the results obtained by a clustering-based method.
- Score: 64.12519350944572
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates the utilization of an end-to-end diarization model as
post-processing of conventional clustering-based diarization. Clustering-based
diarization methods partition frames into clusters of the number of speakers;
thus, they typically cannot handle overlapping speech because each frame is
assigned to one speaker. On the other hand, some end-to-end diarization methods
can handle overlapping speech by treating the problem as multi-label
classification. Although some methods can treat a flexible number of speakers,
they do not perform well when the number of speakers is large. To compensate
for each other's weakness, we propose to use a two-speaker end-to-end
diarization method as post-processing of the results obtained by a
clustering-based method. We iteratively select two speakers from the results
and update the results of the two speakers to improve the overlapped region.
Experimental results show that the proposed algorithm consistently improved the
performance of the state-of-the-art methods across CALLHOME, AMI, and DIHARD II
datasets.
Related papers
- Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios [0.9094127664014627]
End-to-end neural speaker diarization systems are able to address the speaker diarization task while effectively handling speech overlap.
This work explores the incorporation of speaker information embeddings into the end-to-end systems to enhance the speaker discriminative capabilities.
arXiv Detail & Related papers (2024-07-01T14:26:28Z) - In search of strong embedding extractors for speaker diarisation [49.7017388682077]
We tackle two key problems when adopting EEs for speaker diarisation.
First, the evaluation is not straightforward because the features required for better performance differ between speaker verification and diarisation.
We show that better performance on widely adopted speaker verification evaluation protocols does not lead to better diarisation performance.
We propose two data augmentation techniques to alleviate the second problem, making embedding extractors aware of overlapped speech or speaker change input.
arXiv Detail & Related papers (2022-10-26T13:00:29Z) - Coarse-to-Fine Recursive Speech Separation for Unknown Number of
Speakers [8.380514397417457]
This paper formulates the speech separation with the unknown number of speakers as a multi-pass source extraction problem.
Experiments show that the proposed method archived state-of-the-art performance on the WSJ0 dataset with a different number of speakers.
arXiv Detail & Related papers (2022-03-30T04:45:34Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Towards Neural Diarization for Unlimited Numbers of Speakers Using
Global and Local Attractors [51.01295414889487]
We introduce an unsupervised clustering process embedded in the attractor-based end-to-end diarization.
Our method achieved 11.84 %, 28.33 %, and 19.49 % on the CALLHOME, DIHARD II, and DIHARD III datasets.
arXiv Detail & Related papers (2021-07-04T05:34:21Z) - End-to-End Diarization for Variable Number of Speakers with Local-Global
Networks and Discriminative Speaker Embeddings [66.50782702086575]
We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings.
The proposed system is designed to handle meetings with unknown numbers of speakers, using variable-number permutation-invariant cross-entropy based loss functions.
arXiv Detail & Related papers (2021-05-05T14:55:29Z) - Integrating end-to-end neural and clustering-based diarization: Getting
the best of both worlds [71.36164750147827]
Clustering-based approaches assign speaker labels to speech regions by clustering speaker embeddings such as x-vectors.
End-to-end neural diarization (EEND) directly predicts diarization labels using a neural network.
We propose a simple but effective hybrid diarization framework that works with overlapped speech and for long recordings containing an arbitrary number of speakers.
arXiv Detail & Related papers (2020-10-26T06:33:02Z) - Neural Speaker Diarization with Speaker-Wise Chain Rule [45.60980782843576]
We propose a speaker-wise conditional inference method for speaker diarization.
We show that the proposed method can correctly produce diarization results with a variable number of speakers.
arXiv Detail & Related papers (2020-06-02T17:28:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.