Lexical Speaker Error Correction: Leveraging Language Models for Speaker
Diarization Error Correction
- URL: http://arxiv.org/abs/2306.09313v1
- Date: Thu, 15 Jun 2023 17:47:41 GMT
- Title: Lexical Speaker Error Correction: Leveraging Language Models for Speaker
Diarization Error Correction
- Authors: Rohit Paturi, Sundararajan Srinivasan, Xiang Li
- Abstract summary: Speaker diarization (SD) is typically used with an automatic speech recognition (ASR) system to ascribe speaker labels to recognized words.
This approach can lead to speaker errors especially around speaker turns and regions of speaker overlap.
We propose a novel second-pass speaker error correction system using lexical information.
- Score: 4.409889336732851
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Speaker diarization (SD) is typically used with an automatic speech
recognition (ASR) system to ascribe speaker labels to recognized words. The
conventional approach reconciles outputs from independently optimized ASR and
SD systems, where the SD system typically uses only acoustic information to
identify the speakers in the audio stream. This approach can lead to speaker
errors especially around speaker turns and regions of speaker overlap. In this
paper, we propose a novel second-pass speaker error correction system using
lexical information, leveraging the power of modern language models (LMs). Our
experiments across multiple telephony datasets show that our approach is both
effective and robust. Training and tuning only on the Fisher dataset, this
error correction approach leads to relative word-level diarization error rate
(WDER) reductions of 15-30% on three telephony datasets: RT03-CTS, Callhome
American English and held-out portions of Fisher.
Related papers
- Speaker Tagging Correction With Non-Autoregressive Language Models [0.0]
We propose a speaker tagging correction system based on a non-autoregressive language model.
We show that the employed error correction approach leads to reductions in word diarization error rate (WDER) on two datasets.
arXiv Detail & Related papers (2024-08-30T11:02:17Z) - AG-LSEC: Audio Grounded Lexical Speaker Error Correction [9.54540722574194]
Speaker Diarization (SD) systems are typically audio-based and operate independently of the ASR system in traditional speech transcription pipelines.
We propose to enhance and acoustically ground the Lexical Speaker Error Correction (LSEC) system with speaker scores directly derived from the existing SD pipeline.
This approach achieves significant relative WDER reductions in the range of 25-40% over the audio-based SD, ASR system and beats the LSEC system by 15-25% relative on RT03-CTS, Callhome American English and Fisher datasets.
arXiv Detail & Related papers (2024-06-25T04:20:49Z) - Online speaker diarization of meetings guided by speech separation [0.0]
Overlapped speech is notoriously problematic for speaker diarization systems.
We introduce a new speech separation-guided diarization scheme suitable for the online speaker diarization of long meeting recordings.
arXiv Detail & Related papers (2024-01-30T09:09:22Z) - DiariST: Streaming Speech Translation with Speaker Diarization [53.595990270899414]
We propose DiariST, the first streaming ST and SD solution.
It is built upon a neural transducer-based streaming ST system and integrates token-level serialized output training and t-vector.
Our system achieves a strong ST and SD capability compared to offline systems based on Whisper, while performing streaming inference for overlapping speech.
arXiv Detail & Related papers (2023-09-14T19:33:27Z) - Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech [62.95422526044178]
We use Model Agnostic Meta-Learning (MAML) as the training algorithm of a multi-speaker TTS model.
We show that Meta-TTS can synthesize high speaker-similarity speech from few enrollment samples with fewer adaptation steps than the speaker adaptation baseline.
arXiv Detail & Related papers (2021-11-07T09:53:31Z) - End-to-End Speaker-Attributed ASR with Transformer [41.7739129773237]
This paper presents an end-to-end speaker-attributed automatic speech recognition system.
It jointly performs speaker counting, speech recognition and speaker identification for monaural multi-talker audio.
arXiv Detail & Related papers (2021-04-05T19:54:15Z) - Streaming Multi-talker Speech Recognition with Joint Speaker
Identification [77.46617674133556]
SURIT employs the recurrent neural network transducer (RNN-T) as the backbone for both speech recognition and speaker identification.
We validate our idea on the Librispeech dataset -- a multi-talker dataset derived from Librispeech, and present encouraging results.
arXiv Detail & Related papers (2021-04-05T18:37:33Z) - Bayesian Learning for Deep Neural Network Adaptation [57.70991105736059]
A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences.
Model-based speaker adaptation approaches often require sufficient amounts of target speaker data to ensure robustness.
This paper proposes a full Bayesian learning based DNN speaker adaptation framework to model speaker-dependent (SD) parameter uncertainty.
arXiv Detail & Related papers (2020-12-14T12:30:41Z) - Speaker Diarization with Lexical Information [59.983797884955]
This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition.
We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy.
arXiv Detail & Related papers (2020-04-13T17:16:56Z) - End-to-End Neural Diarization: Reformulating Speaker Diarization as
Simple Multi-label Classification [45.38809571153867]
We propose the End-to-End Neural Diarization (EEND) in which a neural network directly outputs speaker diarization results.
By feeding multi-speaker recordings with corresponding speaker segment labels, our model can be easily adapted to real conversations.
arXiv Detail & Related papers (2020-02-24T14:53:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.