Related papers: Predicting Compact Phrasal Rewrites with Large Language Models for ASR Post Editing

Predicting Compact Phrasal Rewrites with Large Language Models for ASR Post Editing

URL: http://arxiv.org/abs/2501.13831v1
Date: Thu, 23 Jan 2025 16:54:27 GMT
Title: Predicting Compact Phrasal Rewrites with Large Language Models for ASR Post Editing
Authors: Hao Zhang, Felix Stahlberg, Shankar Kumar,
Abstract summary: Large Language Models (LLMs) excel at rewriting tasks such as text style transfer and grammatical error correction.<n>We propose alternative edit phrase representations inspired by phrase-based statistical machine translation.
Score: 18.962260162806988
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) excel at rewriting tasks such as text style transfer and grammatical error correction. While there is considerable overlap between the inputs and outputs in these tasks, the decoding cost still increases with output length, regardless of the amount of overlap. By leveraging the overlap between the input and the output, Kaneko and Okazaki (2023) proposed model-agnostic edit span representations to compress the rewrites to save computation. They reported an output length reduction rate of nearly 80% with minimal accuracy impact in four rewriting tasks. In this paper, we propose alternative edit phrase representations inspired by phrase-based statistical machine translation. We systematically compare our phrasal representations with their span representations. We apply the LLM rewriting model to the task of Automatic Speech Recognition (ASR) post editing and show that our target-phrase-only edit representation has the best efficiency-accuracy trade-off. On the LibriSpeech test set, our method closes 50-60% of the WER gap between the edit span model and the full rewrite model while losing only 10-20% of the length reduction rate of the edit span model.

Related papers

Decomposing Reasoning Efficiency in Large Language Models [2.4149105714758545]
We decompose token efficiency into interpretable factors: completion under a fixed token budget, conditional correctness given completion, and verbosity.<n>When reasoning traces are available, we add deterministic trace-quality measures to separate looping from verbose-but-engaged reasoning.<n>Our decomposition reveals distinct bottleneck profiles that suggest different efficiency interventions.
arXiv Detail & Related papers (2026-02-10T14:09:18Z)
Context-Enhanced Granular Edit Representation for Efficient and Accurate ASR Post-editing [3.219880761967806]
Despite ASR technology being full-scale adopted by industry and for large portions of the population, ASR systems often have errors that require editors to post-edit text quality.<n>This paper introduces CEGER, a compact edit representation that was generated for highly accurate, efficient ASR post-editing.<n> CEGER achieves state-of-the-art accuracy, achieving the lowest word error rate (WER) versus full rewrite and prior compact representations.
arXiv Detail & Related papers (2025-09-13T16:57:32Z)
NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts [57.53692236201343]
We propose a Multi-Task Correction MoE, where we train the experts to become an expert'' of speech-to-text, language-to-text and vision-to-text datasets. NeKo performs competitively on grammar and post-OCR correction as a multi-task model.
arXiv Detail & Related papers (2024-11-08T20:11:24Z)
Reducing Sequence Length by Predicting Edit Operations with Large Language Models [50.66922361766939]
This paper proposes predicting edit spans for the source text for local sequence transduction tasks. We apply instruction tuning for Large Language Models on the supervision data of edit spans. Experiments show that the proposed method achieves comparable performance to the baseline in four tasks.
arXiv Detail & Related papers (2023-05-19T17:51:05Z)
PATCorrect: Non-autoregressive Phoneme-augmented Transformer for ASR Error Correction [0.9502148118198473]
We propose PATCorrect, a novel non-autoregressive (NAR) approach to reduce word error rate (WER) We demonstrate that PATCorrect consistently outperforms state-of-the-art NAR method on English corpus across different upstream ASR systems.
arXiv Detail & Related papers (2023-02-10T04:05:24Z)
Converge to the Truth: Factual Error Correction via Iterative Constrained Editing [30.740281040892086]
We propose VENCE, a novel method for factual error correction (FEC) with minimal edits. VENCE formulates the FEC problem as iterative sampling editing actions with respect to a target density function. Experiments on a public dataset show that VENCE improves the well-adopted SARI metric by 5.3 (or a relative improvement of 11.8%) over the previous best distantly-supervised methods.
arXiv Detail & Related papers (2022-11-22T10:03:13Z)
Improving Factual Consistency in Summarization with Compression-Based Post-Editing [146.24839415743358]
We show that a model-agnostic way to address this problem is post-editing the generated summaries. We propose to use sentence-compression data to train the post-editing model to take a summary with extrinsic entity errors marked with special tokens. We show that this model improves factual consistency while maintaining ROUGE, improving entity precision by up to 30% on XSum, and that this model can be applied on top of another post-editor.
arXiv Detail & Related papers (2022-11-11T13:35:38Z)
Thutmose Tagger: Single-pass neural model for Inverse Text Normalization [76.87664008338317]
Inverse text normalization (ITN) is an essential post-processing step in automatic speech recognition. We present a dataset preparation method based on the granular alignment of ITN examples. One-to-one correspondence between tags and input words improves the interpretability of the model's predictions.
arXiv Detail & Related papers (2022-07-29T20:39:02Z)
Factual Error Correction for Abstractive Summaries Using Entity Retrieval [57.01193722520597]
We propose an efficient factual error correction system RFEC based on entities retrieval post-editing process. RFEC retrieves the evidence sentences from the original document by comparing the sentences with the target summary. Next, RFEC detects the entity-level errors in the summaries by considering the evidence sentences and substitutes the wrong entities with the accurate entities from the evidence sentences.
arXiv Detail & Related papers (2022-04-18T11:35:02Z)
Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems [27.483603895258437]
We introduce a novel approach to do contextual biasing by adding a contextual spelling correction model on top of the end-to-end ASR system. We propose filtering algorithms to handle large-size context lists, and performance balancing mechanisms to control the biasing degree of the model. Experiments show that the proposed method achieves as much as 51% relative word error rate (WER) reduction over ASR system and outperforms traditional biasing methods.
arXiv Detail & Related papers (2022-03-02T06:00:48Z)
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition [90.34177266618143]
We propose FastCorrect, a novel NAR error correction model based on edit alignment. FastCorrect speeds up the inference by 6-9 times and maintains the accuracy (8-14% WER reduction) compared with the autoregressive correction model. It outperforms the accuracy of popular NAR models adopted in neural machine translation by a large margin.
arXiv Detail & Related papers (2021-05-09T05:35:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.