Adapting Speaker Embeddings for Speaker Diarisation
- URL: http://arxiv.org/abs/2104.02879v1
- Date: Wed, 7 Apr 2021 03:04:47 GMT
- Title: Adapting Speaker Embeddings for Speaker Diarisation
- Authors: Youngki Kwon, Jee-weon Jung, Hee-Soo Heo, You Jin Kim, Bong-Jin Lee,
Joon Son Chung
- Abstract summary: The goal of this paper is to adapt speaker embeddings for solving the problem of speaker diarisation.
We propose three techniques that can be used to better adapt the speaker embeddings for diarisation: dimensionality reduction, attention-based embedding aggregation, and non-speech clustering.
The results demonstrate that all three techniques contribute positively to the performance of the diarisation system achieving an average relative improvement of 25.07% in terms of diarisation error rate over the baseline.
- Score: 30.383712356205084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of this paper is to adapt speaker embeddings for solving the problem
of speaker diarisation. The quality of speaker embeddings is paramount to the
performance of speaker diarisation systems. Despite this, prior works in the
field have directly used embeddings designed only to be effective on the
speaker verification task. In this paper, we propose three techniques that can
be used to better adapt the speaker embeddings for diarisation: dimensionality
reduction, attention-based embedding aggregation, and non-speech clustering. A
wide range of experiments is performed on various challenging datasets. The
results demonstrate that all three techniques contribute positively to the
performance of the diarisation system achieving an average relative improvement
of 25.07% in terms of diarisation error rate over the baseline.
Related papers
- Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios [0.9094127664014627]
End-to-end neural speaker diarization systems are able to address the speaker diarization task while effectively handling speech overlap.
This work explores the incorporation of speaker information embeddings into the end-to-end systems to enhance the speaker discriminative capabilities.
arXiv Detail & Related papers (2024-07-01T14:26:28Z) - Improving Speaker Diarization using Semantic Information: Joint Pairwise
Constraints Propagation [53.01238689626378]
We propose a novel approach to leverage semantic information in speaker diarization systems.
We introduce spoken language understanding modules to extract speaker-related semantic information.
We present a novel framework to integrate these constraints into the speaker diarization pipeline.
arXiv Detail & Related papers (2023-09-19T09:13:30Z) - In search of strong embedding extractors for speaker diarisation [49.7017388682077]
We tackle two key problems when adopting EEs for speaker diarisation.
First, the evaluation is not straightforward because the features required for better performance differ between speaker verification and diarisation.
We show that better performance on widely adopted speaker verification evaluation protocols does not lead to better diarisation performance.
We propose two data augmentation techniques to alleviate the second problem, making embedding extractors aware of overlapped speech or speaker change input.
arXiv Detail & Related papers (2022-10-26T13:00:29Z) - Royalflush Speaker Diarization System for ICASSP 2022 Multi-channel
Multi-party Meeting Transcription Challenge [4.022057598291766]
Royalflush speaker diarization system submitted to the Multi-channel Multi-party Meeting Transcription Challenge.
System comprises speech enhancement, overlapped speech detection, speaker embedding extraction, speaker clustering, speech separation and system fusion.
arXiv Detail & Related papers (2022-02-10T03:35:05Z) - PL-EESR: Perceptual Loss Based END-TO-END Robust Speaker Representation
Extraction [90.55375210094995]
Speech enhancement aims to improve the perceptual quality of the speech signal by suppression of the background noise.
We propose an end-to-end deep learning framework, dubbed PL-EESR, for robust speaker representation extraction.
arXiv Detail & Related papers (2021-10-03T07:05:29Z) - End-to-End Speaker Diarization as Post-Processing [64.12519350944572]
Clustering-based diarization methods partition frames into clusters of the number of speakers.
Some end-to-end diarization methods can handle overlapping speech by treating the problem as multi-label classification.
We propose to use a two-speaker end-to-end diarization method as post-processing of the results obtained by a clustering-based method.
arXiv Detail & Related papers (2020-12-18T05:31:07Z) - Bayesian Learning for Deep Neural Network Adaptation [57.70991105736059]
A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences.
Model-based speaker adaptation approaches often require sufficient amounts of target speaker data to ensure robustness.
This paper proposes a full Bayesian learning based DNN speaker adaptation framework to model speaker-dependent (SD) parameter uncertainty.
arXiv Detail & Related papers (2020-12-14T12:30:41Z) - Speaker Re-identification with Speaker Dependent Speech Enhancement [37.33388614967888]
This paper introduces a novel approach that cascades speech enhancement and speaker recognition.
The proposed approach is evaluated using the Voxceleb1 dataset, which aims to assess speaker recognition in real world situations.
arXiv Detail & Related papers (2020-05-15T23:02:10Z) - Speaker Diarization with Lexical Information [59.983797884955]
This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition.
We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy.
arXiv Detail & Related papers (2020-04-13T17:16:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.