DiaCorrect: Error Correction Back-end For Speaker Diarization
- URL: http://arxiv.org/abs/2309.08377v1
- Date: Fri, 15 Sep 2023 13:08:12 GMT
- Title: DiaCorrect: Error Correction Back-end For Speaker Diarization
- Authors: Jiangyu Han, Federico Landini, Johan Rohdin, Mireia Diez, Lukas
Burget, Yuhang Cao, Heng Lu, Jan Cernocky
- Abstract summary: We propose an error correction framework, named DiaCorrect, to refine the output of a diarization system.
Our model consists of two parallel convolutional encoders and a transform-based decoder.
- Score: 9.311650017389262
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we propose an error correction framework, named DiaCorrect, to
refine the output of a diarization system in a simple yet effective way. This
method is inspired by error correction techniques in automatic speech
recognition. Our model consists of two parallel convolutional encoders and a
transform-based decoder. By exploiting the interactions between the input
recording and the initial system's outputs, DiaCorrect can automatically
correct the initial speaker activities to minimize the diarization errors.
Experiments on 2-speaker telephony data show that the proposed DiaCorrect can
effectively improve the initial model's results. Our source code is publicly
available at https://github.com/BUTSpeechFIT/diacorrect.
Related papers
- Speaker Tagging Correction With Non-Autoregressive Language Models [0.0]
We propose a speaker tagging correction system based on a non-autoregressive language model.
We show that the employed error correction approach leads to reductions in word diarization error rate (WDER) on two datasets.
arXiv Detail & Related papers (2024-08-30T11:02:17Z) - Tag and correct: high precision post-editing approach to correction of speech recognition errors [0.0]
It consists of using a neural sequence tagger that learns how to correct an ASR (Automatic Speech Recognition) hypothesis word by word and a corrector module that applies corrections returned by the tagger.
The proposed solution is applicable to any ASR system, regardless of its architecture, and provides high-precision control over errors being corrected.
arXiv Detail & Related papers (2024-06-11T09:52:33Z) - LM-Combiner: A Contextual Rewriting Model for Chinese Grammatical Error Correction [49.0746090186582]
Over-correction is a critical problem in Chinese grammatical error correction (CGEC) task.
Recent work using model ensemble methods can effectively mitigate over-correction and improve the precision of the GEC system.
We propose the LM-Combiner, a rewriting model that can directly modify the over-correction of GEC system outputs without a model ensemble.
arXiv Detail & Related papers (2024-03-26T06:12:21Z) - One model to rule them all ? Towards End-to-End Joint Speaker
Diarization and Speech Recognition [50.055765860343286]
This paper presents a novel framework for joint speaker diarization and automatic speech recognition.
The framework, named SLIDAR, can process arbitrary length inputs and can handle any number of speakers.
Experiments performed on monaural recordings from the AMI corpus confirm the effectiveness of the method in both close-talk and far-field speech scenarios.
arXiv Detail & Related papers (2023-10-02T23:03:30Z) - Lexical Speaker Error Correction: Leveraging Language Models for Speaker
Diarization Error Correction [4.409889336732851]
Speaker diarization (SD) is typically used with an automatic speech recognition (ASR) system to ascribe speaker labels to recognized words.
This approach can lead to speaker errors especially around speaker turns and regions of speaker overlap.
We propose a novel second-pass speaker error correction system using lexical information.
arXiv Detail & Related papers (2023-06-15T17:47:41Z) - SoftCorrect: Error Correction with Soft Detection for Automatic Speech
Recognition [116.31926128970585]
We propose SoftCorrect with a soft error detection mechanism to avoid the limitations of both explicit and implicit error detection.
Compared with implicit error detection with CTC loss, SoftCorrect provides explicit signal about which words are incorrect.
Experiments on AISHELL-1 and Aidatatang datasets show that SoftCorrect achieves 26.1% and 9.4% CER reduction respectively.
arXiv Detail & Related papers (2022-12-02T09:11:32Z) - Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired
Speech Data [145.95460945321253]
We introduce two pre-training tasks for the encoder-decoder network using acoustic units, i.e., pseudo codes.
The proposed Speech2C can relatively reduce the word error rate (WER) by 19.2% over the method without decoder pre-training.
arXiv Detail & Related papers (2022-03-31T15:33:56Z) - FastCorrect 2: Fast Error Correction on Multiple Candidates for
Automatic Speech Recognition [92.12910821300034]
We propose FastCorrect 2, an error correction model that takes multiple ASR candidates as input for better correction accuracy.
FastCorrect 2 achieves better performance than the cascaded re-scoring and correction pipeline.
arXiv Detail & Related papers (2021-09-29T13:48:03Z) - End-to-End Neural Diarization: Reformulating Speaker Diarization as
Simple Multi-label Classification [45.38809571153867]
We propose the End-to-End Neural Diarization (EEND) in which a neural network directly outputs speaker diarization results.
By feeding multi-speaker recordings with corresponding speaker segment labels, our model can be easily adapted to real conversations.
arXiv Detail & Related papers (2020-02-24T14:53:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.