Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction
- URL: http://arxiv.org/abs/2505.24347v2
- Date: Thu, 05 Jun 2025 09:26:39 GMT
- Title: Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction
- Authors: Yangui Fang, Baixu Cheng, Jing Peng, Xu Li, Yu Xi, Chengwei Zhang, Guohui Zhong,
- Abstract summary: We propose the Reliable Correction Framework (RLLM-CF), which consists of three stages: error pre-detection, chain-of-thought sub-tasks iterative correction, and reasoning process verification.<n>Experiments on AISHELL-1, AISHELL-2, and Librispeech show that the GPT-4o model enhanced by our framework achieves 21%, 11%, 9%, and 11.4% relative reductions in CER/WER.
- Score: 4.304383298057423
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic Speech Recognition (ASR) error correction aims to correct recognition errors while preserving accurate text. Although traditional approaches demonstrate moderate effectiveness, LLMs offer a paradigm that eliminates the need for training and labeled data. However, directly using LLMs will encounter hallucinations problem, which may lead to the modification of the correct text. To address this problem, we propose the Reliable LLM Correction Framework (RLLM-CF), which consists of three stages: (1) error pre-detection, (2) chain-of-thought sub-tasks iterative correction, and (3) reasoning process verification. The advantage of our method is that it does not require additional information or fine-tuning of the model, and ensures the correctness of the LLM correction under multi-pass programming. Experiments on AISHELL-1, AISHELL-2, and Librispeech show that the GPT-4o model enhanced by our framework achieves 21%, 11%, 9%, and 11.4% relative reductions in CER/WER.
Related papers
- Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs [0.0]
Self-correction is an important capability for large language models (LLMs)<n>While LLMs can identify error in user input, they exhibit a systematic 'Self-Correction Blind Spot'<n>Testing 14 models, we find an average 64.5% blind spot rate.<n>Remarkably, simply appending "Wait" reduces blind spots by 89.3%, suggesting that the capability exists but requires activation.
arXiv Detail & Related papers (2025-07-03T16:41:30Z) - MCTS-Refined CoT: High-Quality Fine-Tuning Data for LLM-Based Repository Issue Resolution [18.314436803012434]
The paper proposes MCTS-INE, an enhanced Monte Carlo Tree Search (MCTS)-based algorithm that dynamically validates and optimize intermediate reasoning steps.<n> Experiments on SWE-bench Lite and SWE-bench Verified demonstrate that LLMs fine-tuned with our CoT dataset achieve substantial improvements over baselines.
arXiv Detail & Related papers (2025-06-15T05:42:01Z) - High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning [84.52940628494879]
Large Language Models (LLMs) currently respond to every prompt.<n>LLMs can produce incorrect answers when they lack knowledge or capability.<n>We propose post-training an LLM to generate content only when confident in its correctness and to otherwise abstain.
arXiv Detail & Related papers (2025-06-04T15:16:21Z) - From Misleading Queries to Accurate Answers: A Three-Stage Fine-Tuning Method for LLMs [5.23164145730825]
Large language models (LLMs) exhibit excellent performance in natural language processing (NLP)<n>Existing methods focus on correcting the output, but they often overlook the potential of improving the ability of LLMs to detect and correct misleading content in the input itself.<n>We propose a novel three-stage fine-tuning method that enhances the ability of LLMs to detect and correct misleading information in the input.
arXiv Detail & Related papers (2025-04-15T15:16:45Z) - S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning [51.84977135926156]
We introduce S$2$R, an efficient framework that enhances LLM reasoning by teaching models to self-verify and self-correct during inference.<n>Our results demonstrate that Qwen2.5-math-7B achieves an accuracy improvement from 51.0% to 81.6%, outperforming models trained on an equivalent amount of long-CoT distilled data.
arXiv Detail & Related papers (2025-02-18T13:40:22Z) - LLM Robustness Against Misinformation in Biomedical Question Answering [50.98256373698759]
The retrieval-augmented generation (RAG) approach is used to reduce the confabulation of large language models (LLMs) for question answering.
We evaluate the effectiveness and robustness of four LLMs against misinformation in answering biomedical questions.
arXiv Detail & Related papers (2024-10-27T16:23:26Z) - Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing [59.405145971637204]
We propose a novel preference learning framework called eRror-Injected Self-Editing (RISE)<n>RISE injects predefined subtle errors into pivotal tokens in reasoning or steps to construct hard pairs for error mitigation.<n>Experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH with only 4.5K training samples.
arXiv Detail & Related papers (2024-10-09T07:43:38Z) - Learning to Refine with Fine-Grained Natural Language Feedback [81.70313509881315]
We propose looking at refinement with feedback as a composition of three distinct LLM competencies.
A key property of the proposed Detect, Critique, Refine ("DCR") method is that the step 2 critique model can give fine-grained feedback about errors.
We show that models of different capabilities benefit from refining with DCR on the task of improving factual consistency of document grounded summaries.
arXiv Detail & Related papers (2024-07-02T16:15:01Z) - Aligning the Objective of LLM-based Program Repair [14.935596175148586]
This paper investigates a new approach to adapt large language models (LLMs) to program repair.<n>Our core insight is that LLM's APR capability can be greatly improved by simply aligning the output to their training objective.<n>Based on this insight, we designed D4C, a straightforward prompting framework for APR.
arXiv Detail & Related papers (2024-04-13T02:36:40Z) - Evaluating LLMs at Detecting Errors in LLM Responses [30.645694514606507]
This work introduces ReaLMistake, the first error detection benchmark consisting of objective, realistic, and diverse errors made by LLMs.
We use ReaLMistake to evaluate error detectors based on 12 Large Language Models.
arXiv Detail & Related papers (2024-04-04T17:19:47Z) - Lyra: Orchestrating Dual Correction in Automated Theorem Proving [63.115422781158934]
Lyra is a new framework that employs two distinct correction mechanisms: Tool Correction and Conjecture Correction.
Tool Correction contributes to mitigating hallucinations, thereby improving the overall accuracy of the proof.
Conjecture Correction refines generation with instruction but does not collect paired (generation, error & refinement) prompts.
arXiv Detail & Related papers (2023-09-27T17:29:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.