Related papers: Online Neural Diarization of Unlimited Numbers of Speakers

Online Neural Diarization of Unlimited Numbers of Speakers

URL: http://arxiv.org/abs/2206.02432v1
Date: Mon, 6 Jun 2022 08:48:26 GMT
Title: Online Neural Diarization of Unlimited Numbers of Speakers
Authors: Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yuki Takashima, Yohei Kawaguchi
Abstract summary: A method to perform speaker diarization for an unlimited number of speakers is described in this paper. The output number of speakers of attractor-based EEND is empirically capped. EEND-GLA solves this problem by introducing unsupervised clustering into attractor-based EEND.
Score: 34.465500195087
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A method to perform offline and online speaker diarization for an unlimited number of speakers is described in this paper. End-to-end neural diarization (EEND) has achieved overlap-aware speaker diarization by formulating it as a multi-label classification problem. It has also been extended for a flexible number of speakers by introducing speaker-wise attractors. However, the output number of speakers of attractor-based EEND is empirically capped; it cannot deal with cases where the number of speakers appearing during inference is higher than that during training because its speaker counting is trained in a fully supervised manner. Our method, EEND-GLA, solves this problem by introducing unsupervised clustering into attractor-based EEND. In the method, the input audio is first divided into short blocks, then attractor-based diarization is performed for each block, and finally the results of each blocks are clustered on the basis of the similarity between locally-calculated attractors. While the number of output speakers is limited within each block, the total number of speakers estimated for the entire input can be higher than the limitation. To use EEND-GLA in an online manner, our method also extends the speaker-tracing buffer, which was originally proposed to enable online inference of conventional EEND. We introduces a block-wise buffer update to make the speaker-tracing buffer compatible with EEND-GLA. Finally, to improve online diarization, our method improves the buffer update method and revisits the variable chunk-size training of EEND. The experimental results demonstrate that EEND-GLA can perform speaker diarization of an unseen number of speakers in both offline and online inferences.

Related papers

Controllable speech synthesis by learning discrete phoneme-level prosodic representations [53.926969174260705]
We present a novel method for phoneme-level prosody control of F0 and duration using intuitive discrete labels. We propose an unsupervised prosodic clustering process which is used to discretize phoneme-level F0 and duration features from a multispeaker speech dataset.
arXiv Detail & Related papers (2022-11-29T15:43:36Z)
Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control [48.3671993252296]
This paper presents a method for phoneme-level prosody control of F0 and duration on a multispeaker text-to-speech setup. An autoregressive attention-based model is used, incorporating multispeaker architecture modules in parallel to a prosody encoder.
arXiv Detail & Related papers (2021-11-19T11:43:59Z)
End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings [66.50782702086575]
We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings. The proposed system is designed to handle meetings with unknown numbers of speakers, using variable-number permutation-invariant cross-entropy based loss functions.
arXiv Detail & Related papers (2021-05-05T14:55:29Z)
End-to-End Speaker Diarization as Post-Processing [64.12519350944572]
Clustering-based diarization methods partition frames into clusters of the number of speakers. Some end-to-end diarization methods can handle overlapping speech by treating the problem as multi-label classification. We propose to use a two-speaker end-to-end diarization method as post-processing of the results obtained by a clustering-based method.
arXiv Detail & Related papers (2020-12-18T05:31:07Z)
BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers [20.22005716662987]
We present a novel online end-to-end neural diarization system, BW-EDA-EEND, that processes data incrementally for a variable number of speakers. For unlimited-latency BW-EDA-EEND, we show only moderate degradation for up to two speakers using a context size of 10 seconds compared to offline EDA-EEND. For limited-latency BW-EDA-EEND, which produces diarization outputs block-by-block as audio arrives, we show accuracy comparable to the offline clustering-based system.
arXiv Detail & Related papers (2020-11-05T06:42:31Z)
Integrating end-to-end neural and clustering-based diarization: Getting the best of both worlds [71.36164750147827]
Clustering-based approaches assign speaker labels to speech regions by clustering speaker embeddings such as x-vectors. End-to-end neural diarization (EEND) directly predicts diarization labels using a neural network. We propose a simple but effective hybrid diarization framework that works with overlapped speech and for long recordings containing an arbitrary number of speakers.
arXiv Detail & Related papers (2020-10-26T06:33:02Z)
Neural Speaker Diarization with Speaker-Wise Chain Rule [45.60980782843576]
We propose a speaker-wise conditional inference method for speaker diarization. We show that the proposed method can correctly produce diarization results with a variable number of speakers.
arXiv Detail & Related papers (2020-06-02T17:28:12Z)
End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification [45.38809571153867]
We propose the End-to-End Neural Diarization (EEND) in which a neural network directly outputs speaker diarization results. By feeding multi-speaker recordings with corresponding speaker segment labels, our model can be easily adapted to real conversations.
arXiv Detail & Related papers (2020-02-24T14:53:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.