Chain of Correction for Full-text Speech Recognition with Large Language Models
- URL: http://arxiv.org/abs/2504.01519v2
- Date: Wed, 20 Aug 2025 02:50:14 GMT
- Title: Chain of Correction for Full-text Speech Recognition with Large Language Models
- Authors: Zhiyuan Tang, Dong Wang, Zhikai Zhou, Yong Liu, Shen Huang, Shidong Shang,
- Abstract summary: Chain of Correction (CoC) is a multi-turn chat format to correct errors segment by segment, guided by pre-recognized text and full-text context for better semantic understanding.<n> Experiments show that CoC significantly outperforms baseline and benchmark systems in correcting full-text ASR outputs.
- Score: 21.37485126269991
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Full-text error correction with Large Language Models (LLMs) for Automatic Speech Recognition (ASR) is attracting increased attention for its ability to address a wide range of error types, such as punctuation restoration and inverse text normalization, across long context. However, challenges remain regarding stability, controllability, completeness, and fluency. To mitigate these issues, this paper proposes the Chain of Correction (CoC), which uses a multi-turn chat format to correct errors segment by segment, guided by pre-recognized text and full-text context for better semantic understanding. Utilizing the open-sourced ChFT dataset, we fine-tune a pre-trained LLM to evaluate CoC's performance. Experiments show that CoC significantly outperforms baseline and benchmark systems in correcting full-text ASR outputs. We also analyze correction thresholds to balance under-correction and over-rephrasing, extrapolate CoC on extra-long ASR outputs, and explore using other types of information to guide error correction.
Related papers
- MTCSC: Retrieval-Augmented Iterative Refinement for Chinese Spelling Correction [3.2706233566525613]
Chinese Spelling Correction aims to detect and correct erroneous tokens in sentences.
LLMs have shown remarkable success in identifying and rectifying potential errors.
Existing CSC task impose rigid constraints requiring input and output lengths to be identical.
arXiv Detail & Related papers (2025-04-26T14:48:44Z) - Full-text Error Correction for Chinese Speech Recognition with Large Language Model [11.287933170894311]
Large Language Models (LLMs) have demonstrated substantial potential for error correction in Automatic Speech Recognition (ASR)<n>This paper investigates the effectiveness of LLMs for error correction in full-text generated by ASR systems from longer speech recordings.
arXiv Detail & Related papers (2024-09-12T06:50:45Z) - A Coin Has Two Sides: A Novel Detector-Corrector Framework for Chinese Spelling Correction [79.52464132360618]
Chinese Spelling Correction (CSC) stands as a foundational Natural Language Processing (NLP) task.
We introduce a novel approach based on error detector-corrector framework.
Our detector is designed to yield two error detection results, each characterized by high precision and recall.
arXiv Detail & Related papers (2024-09-06T09:26:45Z) - CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models [0.0]
This paper introduces Context Leveraging OCR Correction (CLOCR-C)
It uses the infilling and context-adaptive abilities of transformer-based language models (LMs) to improve OCR quality.
The study aims to determine if LMs can perform post-OCR correction, improve downstream NLP tasks, and the value of providing the socio-cultural context as part of the correction process.
arXiv Detail & Related papers (2024-08-30T17:26:05Z) - Refining Corpora from a Model Calibration Perspective for Chinese Spelling Correction [40.11364098789309]
Chinese Spelling Correction (CSC) commonly lacks large-scale high-quality corpora.
Two data augmentation methods are widely adopted: (1) textitRandom Replacement with the guidance of confusion sets and (2) textitOCR/ASR-based Generation that simulates character misusing.
arXiv Detail & Related papers (2024-07-22T09:26:35Z) - Detecting Errors through Ensembling Prompts (DEEP): An End-to-End LLM Framework for Detecting Factual Errors [11.07539342949602]
We propose an end-to-end framework for detecting factual errors in text summarization.
Our framework uses a diverse set of LLM prompts to identify factual inconsistencies.
We calibrate the ensembled models to produce empirically accurate probabilities that a text is factually consistent or free of hallucination.
arXiv Detail & Related papers (2024-06-18T18:59:37Z) - Cross-modal Active Complementary Learning with Self-refining
Correspondence [54.61307946222386]
We propose a Cross-modal Robust Complementary Learning framework (CRCL) to improve the robustness of existing methods.
ACL exploits active and complementary learning losses to reduce the risk of providing erroneous supervision.
SCC utilizes multiple self-refining processes with momentum correction to enlarge the receptive field for correcting correspondences.
arXiv Detail & Related papers (2023-10-26T15:15:11Z) - Chinese Spelling Correction as Rephrasing Language Model [63.65217759957206]
We study Chinese Spelling Correction (CSC), which aims to detect and correct the potential spelling errors in a given sentence.
Current state-of-the-art methods regard CSC as a sequence tagging task and fine-tune BERT-based models on sentence pairs.
We propose Rephrasing Language Model (ReLM), where the model is trained to rephrase the entire sentence by infilling additional slots, instead of character-to-character tagging.
arXiv Detail & Related papers (2023-08-17T06:04:28Z) - You Can Generate It Again: Data-to-Text Generation with Verification and Correction Prompting [24.738004421537926]
Small language models like T5 excel in generating high-quality text for data-to-text tasks.<n>They frequently miss keywords, which is considered one of the most severe and common errors in this task.<n>We explore the potential of using feedback systems to enhance semantic fidelity in smaller language models for data-to-text generation tasks.
arXiv Detail & Related papers (2023-06-28T05:34:25Z) - RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by
Reversing Chain-of-Thought [56.558892336235914]
Reversing Chain-of-Thought (RCoT) is a novel method to improve large language models' reasoning abilities.
RCoT automatically detects and rectifys factual inconsistency in generated solutions.
We show that manually written fine-grained feedback can dramatically improve LLMs' reasoning abilities.
arXiv Detail & Related papers (2023-05-19T08:02:52Z) - Factual Error Correction for Abstractive Summaries Using Entity
Retrieval [57.01193722520597]
We propose an efficient factual error correction system RFEC based on entities retrieval post-editing process.
RFEC retrieves the evidence sentences from the original document by comparing the sentences with the target summary.
Next, RFEC detects the entity-level errors in the summaries by considering the evidence sentences and substitutes the wrong entities with the accurate entities from the evidence sentences.
arXiv Detail & Related papers (2022-04-18T11:35:02Z) - Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese
Grammatical Error Correction [49.25830718574892]
We present a new framework named Tail-to-Tail (textbfTtT) non-autoregressive sequence prediction.
Considering that most tokens are correct and can be conveyed directly from source to target, and the error positions can be estimated and corrected.
Experimental results on standard datasets, especially on the variable-length datasets, demonstrate the effectiveness of TtT in terms of sentence-level Accuracy, Precision, Recall, and F1-Measure.
arXiv Detail & Related papers (2021-06-03T05:56:57Z) - Improving the Efficiency of Grammatical Error Correction with Erroneous
Span Detection and Correction [106.63733511672721]
We propose a novel language-independent approach to improve the efficiency for Grammatical Error Correction (GEC) by dividing the task into two subtasks: Erroneous Span Detection ( ESD) and Erroneous Span Correction (ESC)
ESD identifies grammatically incorrect text spans with an efficient sequence tagging model. ESC leverages a seq2seq model to take the sentence with annotated erroneous spans as input and only outputs the corrected text for these spans.
Experiments show our approach performs comparably to conventional seq2seq approaches in both English and Chinese GEC benchmarks with less than 50% time cost for inference.
arXiv Detail & Related papers (2020-10-07T08:29:11Z) - A Self-Refinement Strategy for Noise Reduction in Grammatical Error
Correction [54.569707226277735]
Existing approaches for grammatical error correction (GEC) rely on supervised learning with manually created GEC datasets.
There is a non-negligible amount of "noise" where errors were inappropriately edited or left uncorrected.
We propose a self-refinement method where the key idea is to denoise these datasets by leveraging the prediction consistency of existing models.
arXiv Detail & Related papers (2020-10-07T04:45:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.