Related papers: Open Challenge for Correcting Errors of Speech Recognition Systems

Related papers

Context Biasing for Pronunciations-Orthography Mismatch in Automatic Speech Recognition [56.972851337263755]
We propose a method which allows corrections of substitution errors to improve the recognition accuracy of challenging words.<n>We show that with this method we get a relative improvement in biased word error rate of up to 11%, while maintaining a competitive overall word error rate.
arXiv Detail & Related papers (2025-06-23T14:42:03Z)
Exploring Generative Error Correction for Dysarthric Speech Recognition [12.584296717901116]
We propose a two-stage framework for the Speech Accessibility Project Challenge at INTERSPEECH 2025.<n>We assess different configurations of model scales and training strategies, incorporating specific hypothesis selection to improve transcription accuracy.<n>We provide insights into the complementary roles of acoustic and linguistic modeling in dysarthric speech recognition.
arXiv Detail & Related papers (2025-05-26T16:06:31Z)
Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits [82.8859060022651]
We introduce the Speech INfilling Edit (SINE) dataset, created with Voicebox. Subjective evaluations confirm that speech edited using this novel technique is more challenging to detect than conventional cut-and-paste methods. Despite human difficulty, experimental results demonstrate that self-supervised-based detectors can achieve remarkable performance in detection, localization, and generalization.
arXiv Detail & Related papers (2025-01-07T14:17:47Z)
Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition [110.8431434620642]
We introduce the generative speech transcription error correction (GenSEC) challenge. This challenge comprises three post-ASR language modeling tasks: (i) post-ASR transcription correction, (ii) speaker tagging, and (iii) emotion recognition. We discuss insights from baseline evaluations, as well as lessons learned for designing future evaluations.
arXiv Detail & Related papers (2024-09-15T16:32:49Z)
Speaker Tagging Correction With Non-Autoregressive Language Models [0.0]
We propose a speaker tagging correction system based on a non-autoregressive language model. We show that the employed error correction approach leads to reductions in word diarization error rate (WDER) on two datasets.
arXiv Detail & Related papers (2024-08-30T11:02:17Z)
Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors [0.4681661603096333]
In a spoken dialogue system, an NLU model is preceded by a speech recognition system that can deteriorate the performance of natural language understanding. This paper proposes a method for investigating the impact of speech recognition errors on the performance of natural language understanding models.
arXiv Detail & Related papers (2023-10-25T13:07:07Z)
ed-cec: improving rare word recognition using asr postprocessing based on error detection and context-aware error correction [30.486396813844195]
We present a novel ASR postprocessing method that focuses on improving the recognition of rare words through error detection and context-aware error correction. Experimental results across five datasets demonstrate that our proposed method achieves significantly lower word error rates (WERs) than previous approaches.
arXiv Detail & Related papers (2023-10-08T11:40:30Z)
Contextual-Utterance Training for Automatic Speech Recognition [65.4571135368178]
We propose a contextual-utterance training technique which makes use of the previous and future contextual utterances. Also, we propose a dual-mode contextual-utterance training technique for streaming automatic speech recognition (ASR) systems. The proposed technique is able to reduce both the WER and the average last token emission latency by more than 6% and 40ms relative.
arXiv Detail & Related papers (2022-10-27T08:10:44Z)
End-to-end Speech-to-Punctuated-Text Recognition [23.44236710364419]
punctuation marks are important for the readability of the speech recognition results. Conventional automatic speech recognition systems do not produce punctuation marks. We propose an end-to-end model that takes speech as input and outputs punctuated texts.
arXiv Detail & Related papers (2022-07-07T08:58:01Z)
Towards End-to-end Unsupervised Speech Recognition [120.4915001021405]
We introduce wvu which does away with all audio-side pre-processing and improves accuracy through better architecture. In addition, we introduce an auxiliary self-supervised objective that ties model predictions back to the input. Experiments show that wvuimproves unsupervised recognition results across different languages while being conceptually simpler.
arXiv Detail & Related papers (2022-04-05T21:22:38Z)
Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies. This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z)
Contextualized Attention-based Knowledge Transfer for Spoken Conversational Question Answering [63.72278693825945]
Spoken conversational question answering (SCQA) requires machines to model complex dialogue flow. We propose CADNet, a novel contextualized attention-based distillation approach. We conduct extensive experiments on the Spoken-CoQA dataset and demonstrate that our approach achieves remarkable performance.
arXiv Detail & Related papers (2020-10-21T15:17:18Z)
A Machine of Few Words -- Interactive Speaker Recognition with Reinforcement Learning [35.36769027019856]
We present a new paradigm for automatic speaker recognition that we call Interactive Speaker Recognition (ISR) In this paradigm, the recognition system aims to incrementally build a representation of the speakers by requesting personalized utterances. We show that our method achieves excellent performance while using little speech signal amounts.
arXiv Detail & Related papers (2020-08-07T12:44:08Z)
Deep Speaker Embeddings for Far-Field Speaker Recognition on Short Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions. Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks. This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.