Investigating Training Strategies and Model Robustness of Low-Rank
Adaptation for Language Modeling in Speech Recognition
- URL: http://arxiv.org/abs/2401.10447v1
- Date: Fri, 19 Jan 2024 01:30:16 GMT
- Title: Investigating Training Strategies and Model Robustness of Low-Rank
Adaptation for Language Modeling in Speech Recognition
- Authors: Yu Yu, Chao-Han Huck Yang, Tuan Dinh, Sungho Ryu, Jari Kolehmainen,
Roger Ren, Denis Filimonov, Prashanth G. Shivakumar, Ankur Gandhe, Ariya
Rastow, Jia Xu, Ivan Bulyko, Andreas Stolcke
- Abstract summary: Low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) is a resource-efficient modeling approach for memory-constrained hardware.
In this study, we explore how to enhance model performance by introducing various LoRA training strategies.
To further characterize the stability of LoRA-based second-pass speech recognition models, we examine against input perturbations.
- Score: 27.515920408920216
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The use of low-rank adaptation (LoRA) with frozen pretrained language models
(PLMs) has become increasing popular as a mainstream, resource-efficient
modeling approach for memory-constrained hardware. In this study, we first
explore how to enhance model performance by introducing various LoRA training
strategies, achieving relative word error rate reductions of 3.50\% on the
public Librispeech dataset and of 3.67\% on an internal dataset in the
messaging domain. To further characterize the stability of LoRA-based
second-pass speech recognition models, we examine robustness against input
perturbations. These perturbations are rooted in homophone replacements and a
novel metric called N-best Perturbation-based Rescoring Robustness (NPRR), both
designed to measure the relative degradation in the performance of rescoring
models. Our experimental results indicate that while advanced variants of LoRA,
such as dynamic rank-allocated LoRA, lead to performance degradation in
$1$-best perturbation, they alleviate the degradation in $N$-best perturbation.
This finding is in comparison to fully-tuned models and vanilla LoRA tuning
baselines, suggesting that a comprehensive selection is needed when using
LoRA-based adaptation for compute-cost savings and robust language modeling.
Related papers
- Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language Models [13.56631686493347]
Large language models (LLMs) exhibit remarkable capabilities in natural language processing but face catastrophic forgetting when learning new tasks.
We propose Controlled LoRA (CLoRA), a subspace regularization method on LoRA structure.
arXiv Detail & Related papers (2024-10-22T08:27:23Z) - Learning on LoRAs: GL-Equivariant Processing of Low-Rank Weight Spaces for Large Finetuned Models [38.197552424549514]
Low-rank adaptations (LoRAs) have revolutionized the finetuning of large foundation models.
LoRAs present opportunities for applying machine learning techniques that take these low-rank weights themselves as inputs.
In this paper, we investigate the potential of Learning on LoRAs (LoL), a paradigm where LoRA weights serve as input to machine learning models.
arXiv Detail & Related papers (2024-10-05T15:52:47Z) - UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation [93.38604803625294]
We present UncertaintyRAG, a novel approach for long-context Retrieval-Augmented Generation (RAG)
We use Signal-to-Noise Ratio (SNR)-based span uncertainty to estimate similarity between text chunks.
UncertaintyRAG outperforms baselines by 2.03% on LLaMA-2-7B, achieving state-of-the-art results.
arXiv Detail & Related papers (2024-10-03T17:39:38Z) - Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning [55.5715496559514]
LoRA Slow Cascade Learning (LoRASC) is an innovative technique designed to enhance LoRA's expressiveness and generalization capabilities.
Our approach augments expressiveness through a cascaded learning strategy that enables a mixture-of-low-rank adaptation, thereby increasing the model's ability to capture complex patterns.
arXiv Detail & Related papers (2024-07-01T17:28:59Z) - OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models [0.0]
Low-Rank Adaptation (LoRA) has emerged as a promising method to mitigate these issues.
OLoRA significantly accelerates the convergence of LLM training.
OLoRA exhibits improved performance compared to standard LoRA across a variety of language modeling tasks.
arXiv Detail & Related papers (2024-06-03T20:37:27Z) - Sparse Low-rank Adaptation of Pre-trained Language Models [79.74094517030035]
We introduce sparse low-rank adaptation (SoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process.
Our approach strengthens the representation power of LoRA by initializing it with a higher rank, while efficiently taming a temporarily increased number of parameters.
Our experimental results demonstrate that SoRA can outperform other baselines even with 70% retained parameters and 70% training time.
arXiv Detail & Related papers (2023-11-20T11:56:25Z) - Unleashing the Power of Pre-trained Language Models for Offline
Reinforcement Learning [54.682106515794864]
offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected datasets.
This paper introduces $textbfLanguage Models for $textbfMo$tion Control ($textbfLaMo$), a general framework based on Decision Transformers to use pre-trained Language Models (LMs) for offline RL.
Empirical results indicate $textbfLaMo$ achieves state-of-the-art performance in sparse-reward tasks.
arXiv Detail & Related papers (2023-10-31T16:24:17Z) - Low-rank Adaptation of Large Language Model Rescoring for
Parameter-Efficient Speech Recognition [32.24656612803592]
We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring.
We present a method based on low-rank decomposition to train a rescoring BERT model and adapt it to new domains using only a fraction of the pretrained parameters.
The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is evaluated on LibriSpeech and internal datasets with decreased training times by factors between 5.4 and 3.6.
arXiv Detail & Related papers (2023-09-26T19:41:34Z) - Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex.
This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.