Low-rank Adaptation of Large Language Model Rescoring for
Parameter-Efficient Speech Recognition
- URL: http://arxiv.org/abs/2309.15223v2
- Date: Tue, 10 Oct 2023 06:30:47 GMT
- Title: Low-rank Adaptation of Large Language Model Rescoring for
Parameter-Efficient Speech Recognition
- Authors: Yu Yu, Chao-Han Huck Yang, Jari Kolehmainen, Prashanth G. Shivakumar,
Yile Gu, Sungho Ryu, Roger Ren, Qi Luo, Aditya Gourav, I-Fan Chen, Yi-Chieh
Liu, Tuan Dinh, Ankur Gandhe, Denis Filimonov, Shalini Ghosh, Andreas
Stolcke, Ariya Rastow, Ivan Bulyko
- Abstract summary: We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring.
We present a method based on low-rank decomposition to train a rescoring BERT model and adapt it to new domains using only a fraction of the pretrained parameters.
The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is evaluated on LibriSpeech and internal datasets with decreased training times by factors between 5.4 and 3.6.
- Score: 32.24656612803592
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a neural language modeling system based on low-rank adaptation
(LoRA) for speech recognition output rescoring. Although pretrained language
models (LMs) like BERT have shown superior performance in second-pass
rescoring, the high computational cost of scaling up the pretraining stage and
adapting the pretrained models to specific domains limit their practical use in
rescoring. Here we present a method based on low-rank decomposition to train a
rescoring BERT model and adapt it to new domains using only a fraction (0.08%)
of the pretrained parameters. These inserted matrices are optimized through a
discriminative training objective along with a correlation-based regularization
loss. The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is
evaluated on LibriSpeech and internal datasets with decreased training times by
factors between 5.4 and 3.6.
Related papers
- Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Investigating Training Strategies and Model Robustness of Low-Rank
Adaptation for Language Modeling in Speech Recognition [27.515920408920216]
Low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) is a resource-efficient modeling approach for memory-constrained hardware.
In this study, we explore how to enhance model performance by introducing various LoRA training strategies.
To further characterize the stability of LoRA-based second-pass speech recognition models, we examine against input perturbations.
arXiv Detail & Related papers (2024-01-19T01:30:16Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Differentially Private Adapters for Parameter Efficient Acoustic
Modeling [24.72748979633543]
We introduce a noisy teacher-student ensemble into a conventional adaptation scheme.
We insert residual adapters between layers of the frozen pre-trained acoustic model.
Our solution reduces the number of trainable parameters by 97.5% using the RAs.
arXiv Detail & Related papers (2023-05-19T00:36:43Z) - Parameter-Efficient Learning for Text-to-Speech Accent Adaptation [58.356667204518985]
This paper presents a parameter-efficient learning (PEL) to develop a low-resource accent adaptation for text-to-speech (TTS)
A resource-efficient adaptation from a frozen pre-trained TTS model is developed by using only 1.2% to 0.8% of original trainable parameters.
Experiment results show that the proposed methods can achieve competitive naturalness with parameter-efficient decoder fine-tuning.
arXiv Detail & Related papers (2023-05-18T22:02:59Z) - Fast and accurate factorized neural transducer for text adaption of
end-to-end speech recognition models [23.21666928497697]
The improved adaptation ability of Factorized neural transducer (FNT) on text-only adaptation data came at the cost of lowered accuracy compared to the standard neural transducer model.
A combination of these approaches results in a relative word-error-rate reduction of 9.48% from the standard FNT model.
arXiv Detail & Related papers (2022-12-05T02:52:21Z) - Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction.
It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition.
We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z) - Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex.
This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z) - Bayesian Learning for Deep Neural Network Adaptation [57.70991105736059]
A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences.
Model-based speaker adaptation approaches often require sufficient amounts of target speaker data to ensure robustness.
This paper proposes a full Bayesian learning based DNN speaker adaptation framework to model speaker-dependent (SD) parameter uncertainty.
arXiv Detail & Related papers (2020-12-14T12:30:41Z) - Fine-tuning BERT for Low-Resource Natural Language Understanding via
Active Learning [30.5853328612593]
In this work, we explore fine-tuning methods of BERT -- a pre-trained Transformer based language model.
Our experimental results show an advantage in model performance by maximizing the approximate knowledge gain of the model.
We analyze the benefits of freezing layers of the language model during fine-tuning to reduce the number of trainable parameters.
arXiv Detail & Related papers (2020-12-04T08:34:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.