Context-Enhanced Granular Edit Representation for Efficient and Accurate ASR Post-editing
- URL: http://arxiv.org/abs/2509.14263v1
- Date: Sat, 13 Sep 2025 16:57:32 GMT
- Title: Context-Enhanced Granular Edit Representation for Efficient and Accurate ASR Post-editing
- Authors: Luan Vejsiu, Qianyu Zheng, Haoxuan Chen, Yizhou Han,
- Abstract summary: Despite ASR technology being full-scale adopted by industry and for large portions of the population, ASR systems often have errors that require editors to post-edit text quality.<n>This paper introduces CEGER, a compact edit representation that was generated for highly accurate, efficient ASR post-editing.<n> CEGER achieves state-of-the-art accuracy, achieving the lowest word error rate (WER) versus full rewrite and prior compact representations.
- Score: 3.219880761967806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite ASR technology being full-scale adopted by industry and for large portions of the population, ASR systems often have errors that require editors to post-edit text quality. While LLMs are powerful post-editing tools, baseline full rewrite models have inference inefficiencies because they often generate the same redundant text over and over again. Compact edit representations have existed but often lack the efficacy and context required for optimal accuracy. This paper introduces CEGER (Context-Enhanced Granular Edit Representation), a compact edit representation that was generated for highly accurate, efficient ASR post-editing. CEGER allows LLMs to generate a sequence of structured, fine-grained, contextually rich commands to modify the original ASR output. A separate expansion module deterministically reconstructs the corrected text based on the commands. Extensive experiments on the LibriSpeech dataset that were conducted, CEGER achieves state-of-the-art accuracy, achieving the lowest word error rate (WER) versus full rewrite and prior compact representations.
Related papers
- Model Editing for New Document Integration in Generative Information Retrieval [110.90609826290968]
Generative retrieval (GR) reformulates the Information Retrieval (IR) task as the generation of document identifiers (docIDs)<n>Existing GR models exhibit poor generalization to newly added documents, often failing to generate the correct docIDs.<n>We propose DOME, a novel method that effectively and efficiently adapts GR models to unseen documents.
arXiv Detail & Related papers (2026-03-03T09:13:38Z) - FlowDC: Flow-Based Decoupling-Decay for Complex Image Editing [52.54102743380658]
We propose FlowDC, which decouples the complex editing into multiple sub-editing effects and superposes them in parallel during the editing process.<n>FlowDC shows superior results compared with existing methods.
arXiv Detail & Related papers (2025-12-12T09:08:39Z) - MEMOIR: Lifelong Model Editing with Minimal Overwrite and Informed Retention for LLMs [76.28901550926021]
Existing methods for lifelong model editing compromise generalization, interfere with past edits, or fail to scale to long editing sequences.<n>We propose MEMOIR, a novel scalable framework that injects knowledge through a residual memory, while preserving the core capabilities of the pre-trained model.<n>MeMOIR achieves state-of-the-art performance across reliability, generalization, and locality metrics, scaling to thousands of sequential edits with minimal forgetting.
arXiv Detail & Related papers (2025-06-09T16:16:42Z) - Constraining Sequential Model Editing with Editing Anchor Compression [40.93064933191375]
Large language models (LLMs) struggle with hallucinations due to false or outdated knowledge.<n>This paper statistically observes that the parameter matrix after editing exhibits a significant deviation compared to its previous state as the number of edits increases.<n>A framework termed Editing Anchor Compression (EAC) is proposed to constrain the deviation of the parameter matrix during sequential editing.
arXiv Detail & Related papers (2025-02-25T03:56:49Z) - Bridging the Editing Gap in LLMs: FineEdit for Precise and Targeted Text Modifications [4.751608548909266]
FineEdit is a specialized editing model explicitly trained for context-aware text modifications.<n>FineEdit outperforms state-of-the-art models on single-turn edits, up to 30% over Llama-3.2-3B, and exceeding Mistral-7B-OpenOrca performance by over 40% on direct editing tasks.
arXiv Detail & Related papers (2025-02-19T01:41:44Z) - Predicting Compact Phrasal Rewrites with Large Language Models for ASR Post Editing [18.962260162806988]
Large Language Models (LLMs) excel at rewriting tasks such as text style transfer and grammatical error correction.<n>We propose alternative edit phrase representations inspired by phrase-based statistical machine translation.
arXiv Detail & Related papers (2025-01-23T16:54:27Z) - Assessing Human Editing Effort on LLM-Generated Texts via Compression-Based Edit Distance [2.1792283995628465]
Existing edit distance metrics, such as Levenshtein, BLEU, ROUGE, and TER, often fail to accurately measure the effort required for post-editing.<n>We introduce a novel compression-based edit distance metric grounded in the Lempel-Ziv-77 algorithm.
arXiv Detail & Related papers (2024-12-23T06:29:25Z) - ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA [55.697627106315004]
Large language models (LLMs) require model editing to efficiently update specific knowledge within them and avoid factual errors.<n>Previous approaches manage sequential edits by freezing original parameters and discretely allocating new parameters for each knowledge update.<n>We propose ELDER, a novel approach to create a continuous association between data and adapters.
arXiv Detail & Related papers (2024-08-19T02:27:00Z) - XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates [7.660511135287692]
This paper introduces XATU, the first benchmark specifically designed for fine-grained instruction-based explainable text editing.
XATU considers finer-grained text editing tasks of varying difficulty, incorporating lexical, syntactic, semantic, and knowledge-intensive edit aspects.
We demonstrate the effectiveness of instruction tuning and the impact of underlying architecture across various editing tasks.
arXiv Detail & Related papers (2023-09-20T04:58:59Z) - Factual Error Correction for Abstractive Summaries Using Entity
Retrieval [57.01193722520597]
We propose an efficient factual error correction system RFEC based on entities retrieval post-editing process.
RFEC retrieves the evidence sentences from the original document by comparing the sentences with the target summary.
Next, RFEC detects the entity-level errors in the summaries by considering the evidence sentences and substitutes the wrong entities with the accurate entities from the evidence sentences.
arXiv Detail & Related papers (2022-04-18T11:35:02Z) - FastCorrect: Fast Error Correction with Edit Alignment for Automatic
Speech Recognition [90.34177266618143]
We propose FastCorrect, a novel NAR error correction model based on edit alignment.
FastCorrect speeds up the inference by 6-9 times and maintains the accuracy (8-14% WER reduction) compared with the autoregressive correction model.
It outperforms the accuracy of popular NAR models adopted in neural machine translation by a large margin.
arXiv Detail & Related papers (2021-05-09T05:35:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.