Related papers: N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space

N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space

URL: http://arxiv.org/abs/2303.00456v2
Date: Thu, 1 Jun 2023 23:56:35 GMT
Title: N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space
Authors: Rao Ma, Mark J. F. Gales, Kate M. Knill, Mengjie Qian
Abstract summary: We propose a novel N-best T5 model for this task, which is fine-tuned from a T5 model and utilizes ASR N-best lists as model input. By transferring knowledge from the pre-trained language model and obtaining richer information from the ASR decoding space, the proposed approach outperforms a strong Conformer-Transducer baseline.
Score: 40.402050390096456
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Error correction models form an important part of Automatic Speech Recognition (ASR) post-processing to improve the readability and quality of transcriptions. Most prior works use the 1-best ASR hypothesis as input and therefore can only perform correction by leveraging the context within one sentence. In this work, we propose a novel N-best T5 model for this task, which is fine-tuned from a T5 model and utilizes ASR N-best lists as model input. By transferring knowledge from the pre-trained language model and obtaining richer information from the ASR decoding space, the proposed approach outperforms a strong Conformer-Transducer baseline. Another issue with standard error correction is that the generation process is not well-guided. To address this a constrained decoding process, either based on the N-best list or an ASR lattice, is used which allows additional information to be propagated.

Related papers

FlanEC: Exploring Flan-T5 for Post-ASR Error Correction [25.931773686829796]
We present an encoder-decoder model leveraging Flan-T5 for post-Automatic Speech Recognition (ASR) Generative Speech Error Correction (GenSEC) We explore its application within the GenSEC framework to enhance ASR outputs by mapping n-best hypotheses into a single output sentence. Specifically, we investigate whether scaling the training data and incorporating diverse datasets can lead to significant improvements in post-ASR error correction.
arXiv Detail & Related papers (2025-01-22T16:06:04Z)
ASR Error Correction using Large Language Models [4.75940708384553]
Error correction (EC) models play a crucial role in refining Automatic Speech Recognition (ASR) transcriptions. This work investigates the use of large language models (LLMs) for error correction across diverse scenarios.
arXiv Detail & Related papers (2024-09-14T23:33:38Z)
UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction [18.97378605403447]
We propose UCorrect, an unsupervised Detector-Generator-Selector framework for ASR Error Correction. Experiments on the public AISHELL-1 dataset and WenetSpeech dataset show the effectiveness of UCorrect.
arXiv Detail & Related papers (2024-01-11T06:30:07Z)
Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition [10.62060432965311]
We introduce a new cross-modal fusion technique designed for generative error correction in automatic speech recognition (ASR) Our methodology leverages both acoustic information and external linguistic representations to generate accurate speech transcription contexts.
arXiv Detail & Related papers (2023-10-10T09:04:33Z)
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses. LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z)
Sequence Transduction with Graph-based Supervision [96.04967815520193]
We present a new transducer objective function that generalizes the RNN-T loss to accept a graph representation of the labels. We demonstrate that transducer-based ASR with CTC-like lattice achieves better results compared to standard RNN-T.
arXiv Detail & Related papers (2021-11-01T21:51:42Z)
N-Best ASR Transformer: Enhancing SLU Performance using Multiple ASR Hypotheses [0.0]
Spoken Language Understanding (SLU) parses speech into semantic structures like dialog acts and slots. We show that our approach significantly outperforms the prior state-of-the-art when subjected to the low data regime.
arXiv Detail & Related papers (2021-06-11T17:29:00Z)
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition [90.34177266618143]
We propose FastCorrect, a novel NAR error correction model based on edit alignment. FastCorrect speeds up the inference by 6-9 times and maintains the accuracy (8-14% WER reduction) compared with the autoregressive correction model. It outperforms the accuracy of popular NAR models adopted in neural machine translation by a large margin.
arXiv Detail & Related papers (2021-05-09T05:35:36Z)
Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting [54.03356526990088]
We propose Sequence Span Rewriting (SSR) as a self-supervised sequence-to-sequence (seq2seq) pre-training objective. SSR provides more fine-grained learning signals for text representations by supervising the model to rewrite imperfect spans to ground truth. Our experiments with T5 models on various seq2seq tasks show that SSR can substantially improve seq2seq pre-training.
arXiv Detail & Related papers (2021-01-02T10:27:11Z)
Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU) We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.