Adaptable Multi-Domain Language Model for Transformer ASR
- URL: http://arxiv.org/abs/2008.06208v2
- Date: Thu, 11 Feb 2021 03:17:30 GMT
- Title: Adaptable Multi-Domain Language Model for Transformer ASR
- Authors: Taewoo Lee, Min-Joong Lee, Tae Gyoon Kang, Seokyeoung Jung, Minseok
Kwon, Yeona Hong, Jungin Lee, Kyoung-Gu Woo, Ho-Gyeong Kim, Jiseung Jeong,
Jihyun Lee, Hosik Lee, Young Sang Choi
- Abstract summary: The proposed model can reuse the full fine-tuned LM which is fine-tuned using all layers of an original model.
The proposed model is also effective in reducing the model maintenance cost because it is possible to omit the costly and time-consuming common LM pre-training process.
- Score: 16.8397357399749
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an adapter based multi-domain Transformer based language model
(LM) for Transformer ASR. The model consists of a big size common LM and small
size adapters. The model can perform multi-domain adaptation with only the
small size adapters and its related layers. The proposed model can reuse the
full fine-tuned LM which is fine-tuned using all layers of an original model.
The proposed LM can be expanded to new domains by adding about 2% of parameters
for a first domain and 13% parameters for after second domain. The proposed
model is also effective in reducing the model maintenance cost because it is
possible to omit the costly and time-consuming common LM pre-training process.
Using proposed adapter based approach, we observed that a general LM with
adapter can outperform a dedicated music domain LM in terms of word error rate
(WER).
Related papers
- Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA [38.30350849992281]
"Recursive" language models share parameters across layers with minimal loss of performance.
Recursive Transformers are efficiently from standard pretrained Transformers, but only use a single block of unique layers that is then repeated multiple times in a loop.
We show that our models outperform both similar-sized vanilla pretrained models and knowledge distillation baselines.
arXiv Detail & Related papers (2024-10-28T02:15:45Z) - Hadamard Adapter: An Extreme Parameter-Efficient Adapter Tuning Method for Pre-trained Language Models [108.08773541490191]
Pre-trained Language models (PLMs) have a huge amount of parameters, fine-tuning them is often expensive and time consuming.
It is necessary to adopt a parameter-efficient approach to reduce parameters of PLMs in fine-tuning without compromising their performance in downstream tasks.
In this paper, we design a novel adapter which only acts on self-attention outputs in PLMs.
arXiv Detail & Related papers (2024-07-04T18:21:28Z) - Plug-and-Play Transformer Modules for Test-Time Adaptation [54.80435317208111]
We introduce PLUTO: a Plug-and-pLay modUlar Test-time domain adaptatiOn strategy.
We pre-train a large set of modules, each specialized for different source domains.
We harness multiple most-relevant source domains in a single inference call.
arXiv Detail & Related papers (2024-01-06T00:24:50Z) - Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch [72.97553348776425]
We introduce DARE to set most delta parameters without affecting the abilities of Supervised Fine-Tuning (SFT) LMs.
Then, we use DARE as a versatile plug-in to sparsify delta parameters of multiple SFT models and merge them into a single model.
We also utilize DARE to create a merged LM that ranks first among models with 7 billion parameters on the Open Leaderboard.
arXiv Detail & Related papers (2023-11-06T13:43:07Z) - CombLM: Adapting Black-Box Language Models through Small Fine-Tuned
Models [43.28607973774104]
Methods for adapting language models (LMs) to new tasks and domains have traditionally assumed white-box access to the model.
We present a lightweight method for adapting large LMs to new domains and tasks, assuming no access to their weights or intermediate activations.
Our approach fine-tunes a small white-box LM and combines it with the large black-box LM at the probability level through a small network.
arXiv Detail & Related papers (2023-05-23T06:32:55Z) - Beyond Universal Transformer: block reusing with adaptor in Transformer
for automatic speech recognition [2.5680214354539803]
We propose a solution that can reuse the block in Transformer models for the application of ASR on edge devices.
Specifically, we design a novel block-reusing strategy for speech Transformer (BRST) to enhance the effectiveness of parameters.
arXiv Detail & Related papers (2023-03-23T06:54:37Z) - AdapterSoup: Weight Averaging to Improve Generalization of Pretrained
Language Models [127.04370753583261]
Pretrained language models (PLMs) are trained on massive corpora, but often need to specialize to specific domains.
A solution is to use a related-domain adapter for the novel domain at test time.
We introduce AdapterSoup, an approach that performs weight-space averaging of adapters trained on different domains.
arXiv Detail & Related papers (2023-02-14T13:09:23Z) - AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large
Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters.
This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation.
We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z) - Bilaterally Slimmable Transformer for Elastic and Efficient Visual
Question Answering [75.86788916930377]
bilaterally slimmable Transformer (BST) integrated into arbitrary Transformer-based VQA models.
One slimmed MCAN-BST submodel achieves comparable accuracy on VQA-v2.
Smallest MCAN-BST submodel has 9M parameters and 0.16G FLOPs during inference.
arXiv Detail & Related papers (2022-03-24T02:26:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.