Related papers: PATCorrect: Non-autoregressive Phoneme-augmented Transformer for ASR Error Correction

PATCorrect: Non-autoregressive Phoneme-augmented Transformer for ASR Error Correction

URL: http://arxiv.org/abs/2302.05040v2
Date: Wed, 21 Jun 2023 17:44:58 GMT
Title: PATCorrect: Non-autoregressive Phoneme-augmented Transformer for ASR Error Correction
Authors: Ziji Zhang, Zhehui Wang, Rajesh Kamma, Sharanya Eswaran, Narayanan Sadagopan
Abstract summary: We propose PATCorrect, a novel non-autoregressive (NAR) approach to reduce word error rate (WER) We demonstrate that PATCorrect consistently outperforms state-of-the-art NAR method on English corpus across different upstream ASR systems.
Score: 0.9502148118198473
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Speech-to-text errors made by automatic speech recognition (ASR) systems negatively impact downstream models. Error correction models as a post-processing text editing method have been recently developed for refining the ASR outputs. However, efficient models that meet the low latency requirements of industrial grade production systems have not been well studied. We propose PATCorrect-a novel non-autoregressive (NAR) approach based on multi-modal fusion leveraging representations from both text and phoneme modalities, to reduce word error rate (WER) and perform robustly with varying input transcription quality. We demonstrate that PATCorrect consistently outperforms state-of-the-art NAR method on English corpus across different upstream ASR systems, with an overall 11.62% WER reduction (WERR) compared to 9.46% WERR achieved by other methods using text only modality. Besides, its inference latency is at tens of milliseconds, making it ideal for systems with low latency requirements.

Related papers

Towards interfacing large language models with ASR systems using confidence measures and prompting [54.39667883394458]
This work investigates post-hoc correction of ASR transcripts with large language models (LLMs) To avoid introducing errors into likely accurate transcripts, we propose a range of confidence-based filtering methods. Our results indicate that this can improve the performance of less competitive ASR systems.
arXiv Detail & Related papers (2024-07-31T08:00:41Z)
Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition [10.62060432965311]
We introduce a new cross-modal fusion technique designed for generative error correction in automatic speech recognition (ASR) Our methodology leverages both acoustic information and external linguistic representations to generate accurate speech transcription contexts.
arXiv Detail & Related papers (2023-10-10T09:04:33Z)
Semi-Autoregressive Streaming ASR With Label Context [70.76222767090638]
We propose a streaming "semi-autoregressive" ASR model that incorporates the labels emitted in previous blocks as additional context. Experiments show that our method outperforms the existing streaming NAR model by 19% relative on Tedlium2, 16%/8% on Librispeech-100 clean/other test sets, and 19%/8% on the Switchboard(SWB)/Callhome(CH) test sets.
arXiv Detail & Related papers (2023-09-19T20:55:58Z)
Factual Error Correction for Abstractive Summaries Using Entity Retrieval [57.01193722520597]
We propose an efficient factual error correction system RFEC based on entities retrieval post-editing process. RFEC retrieves the evidence sentences from the original document by comparing the sentences with the target summary. Next, RFEC detects the entity-level errors in the summaries by considering the evidence sentences and substitutes the wrong entities with the accurate entities from the evidence sentences.
arXiv Detail & Related papers (2022-04-18T11:35:02Z)
Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems [27.483603895258437]
We introduce a novel approach to do contextual biasing by adding a contextual spelling correction model on top of the end-to-end ASR system. We propose filtering algorithms to handle large-size context lists, and performance balancing mechanisms to control the biasing degree of the model. Experiments show that the proposed method achieves as much as 51% relative word error rate (WER) reduction over ASR system and outperforms traditional biasing methods.
arXiv Detail & Related papers (2022-03-02T06:00:48Z)
Error Correction in ASR using Sequence-to-Sequence Models [32.41875780785648]
Post-editing in Automatic Speech Recognition entails automatically correcting common and systematic errors produced by the ASR system. We propose to use a powerful pre-trained sequence-to-sequence model, BART, to serve as a denoising model. Experimental results on accented speech data demonstrate that our strategy effectively rectifies a significant number of ASR errors.
arXiv Detail & Related papers (2022-02-02T17:32:59Z)
Cross-Modal ASR Post-Processing System for Error Correction and Utterance Rejection [25.940199825317073]
We propose a cross-modal post-processing system for speech recognizers. It fuses acoustic features and textual features from different modalities. It joints a confidence estimator and an error corrector in multi-task learning fashion.
arXiv Detail & Related papers (2022-01-10T12:29:55Z)
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition [90.34177266618143]
We propose FastCorrect, a novel NAR error correction model based on edit alignment. FastCorrect speeds up the inference by 6-9 times and maintains the accuracy (8-14% WER reduction) compared with the autoregressive correction model. It outperforms the accuracy of popular NAR models adopted in neural machine translation by a large margin.
arXiv Detail & Related papers (2021-05-09T05:35:36Z)
An Approach to Improve Robustness of NLP Systems against ASR Errors [39.57253455717825]
Speech-enabled systems typically first convert audio to text through an automatic speech recognition model and then feed the text to downstream natural language processing modules. The errors of the ASR system can seriously downgrade the performance of the NLP modules. Previous work has shown it is effective to employ data augmentation methods to solve this problem by injecting ASR noise during the training process.
arXiv Detail & Related papers (2021-03-25T05:15:43Z)
Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR) APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker. We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z)
Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU) We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.