Correction Focused Language Model Training for Speech Recognition
- URL: http://arxiv.org/abs/2310.11003v1
- Date: Tue, 17 Oct 2023 05:10:39 GMT
- Title: Correction Focused Language Model Training for Speech Recognition
- Authors: Yingyi Ma, Zhe Liu, Ozlem Kalinli
- Abstract summary: We introduce a novel correction focused LM training approach which aims to prioritize ASR fallible words.
The word-level ASR fallibility score is defined and shaped as a prior word distribution to guide the LM training.
Compared with conventional LMs, correction focused training achieves up to relatively 5.5% word error rate (WER) reduction in sufficient text scenarios.
- Score: 14.246583065323192
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models (LMs) have been commonly adopted to boost the performance of
automatic speech recognition (ASR) particularly in domain adaptation tasks.
Conventional way of LM training treats all the words in corpora equally,
resulting in suboptimal improvements in ASR performance. In this work, we
introduce a novel correction focused LM training approach which aims to
prioritize ASR fallible words. The word-level ASR fallibility score,
representing the likelihood of ASR mis-recognition, is defined and shaped as a
prior word distribution to guide the LM training. To enable correction focused
training with text-only corpora, large language models (LLMs) are employed as
fallibility score predictors and text generators through multi-task
fine-tuning. Experimental results for domain adaptation tasks demonstrate the
effectiveness of our proposed method. Compared with conventional LMs,
correction focused training achieves up to relatively 5.5% word error rate
(WER) reduction in sufficient text scenarios. In insufficient text scenarios,
LM training with LLM-generated text achieves up to relatively 13% WER
reduction, while correction focused training further obtains up to relatively
6% WER reduction.
Related papers
- Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards [49.7719149179179]
This paper investigates the feasibility of using PPO for reinforcement learning (RL) from explicitly programmed reward signals.
We focus on tasks expressed through formal languages, such as programming, where explicit reward functions can be programmed to automatically assess quality of generated outputs.
Our results show that pure RL-based training for the two formal language tasks is challenging, with success being limited even for the simple arithmetic task.
arXiv Detail & Related papers (2024-10-22T15:59:58Z) - Multi-stage Large Language Model Correction for Speech Recognition [10.995600950995021]
We propose a novel multi-stage approach that utilizes uncertainty estimation of ASR outputs and reasoning capability of large language models (LLMs)
Our experimental results demonstrate the effectiveness of the proposed method by showing 10% 20% relative improvement in WER over competitive ASR systems.
arXiv Detail & Related papers (2023-10-17T19:02:40Z) - Generative error correction for code-switching speech recognition using
large language models [49.06203730433107]
Code-switching (CS) speech refers to the phenomenon of mixing two or more languages within the same sentence.
We propose to leverage large language models (LLMs) and lists of hypotheses generated by an ASR to address the CS problem.
arXiv Detail & Related papers (2023-10-17T14:49:48Z) - Low-rank Adaptation of Large Language Model Rescoring for
Parameter-Efficient Speech Recognition [32.24656612803592]
We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring.
We present a method based on low-rank decomposition to train a rescoring BERT model and adapt it to new domains using only a fraction of the pretrained parameters.
The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is evaluated on LibriSpeech and internal datasets with decreased training times by factors between 5.4 and 3.6.
arXiv Detail & Related papers (2023-09-26T19:41:34Z) - Parameter-Efficient Learning for Text-to-Speech Accent Adaptation [58.356667204518985]
This paper presents a parameter-efficient learning (PEL) to develop a low-resource accent adaptation for text-to-speech (TTS)
A resource-efficient adaptation from a frozen pre-trained TTS model is developed by using only 1.2% to 0.8% of original trainable parameters.
Experiment results show that the proposed methods can achieve competitive naturalness with parameter-efficient decoder fine-tuning.
arXiv Detail & Related papers (2023-05-18T22:02:59Z) - On Language Model Integration for RNN Transducer based Speech
Recognition [49.84285563767935]
We study various ILM correction-based LM integration methods formulated in a common RNN-T framework.
We provide a decoding interpretation on two major reasons for performance improvement with ILM correction.
We also propose an exact-ILM training framework by extending the proof given in the hybrid autoregressive transducer.
arXiv Detail & Related papers (2021-10-13T16:30:46Z) - Back-Translated Task Adaptive Pretraining: Improving Accuracy and
Robustness on Text Classification [5.420446976940825]
We propose a back-translated task-adaptive pretraining (BT-TAPT) method that increases the amount of task-specific data for LM re-pretraining.
The experimental results show that the proposed BT-TAPT yields improved classification accuracy on both low- and high-resource data and better robustness to noise than the conventional adaptive pretraining method.
arXiv Detail & Related papers (2021-07-22T06:27:35Z) - An Approach to Improve Robustness of NLP Systems against ASR Errors [39.57253455717825]
Speech-enabled systems typically first convert audio to text through an automatic speech recognition model and then feed the text to downstream natural language processing modules.
The errors of the ASR system can seriously downgrade the performance of the NLP modules.
Previous work has shown it is effective to employ data augmentation methods to solve this problem by injecting ASR noise during the training process.
arXiv Detail & Related papers (2021-03-25T05:15:43Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.