Related papers: Adaptable Multi-Domain Language Model for Transformer ASR

Adaptable Multi-Domain Language Model for Transformer ASR

URL: http://arxiv.org/abs/2008.06208v2
Date: Thu, 11 Feb 2021 03:17:30 GMT
Title: Adaptable Multi-Domain Language Model for Transformer ASR
Authors: Taewoo Lee, Min-Joong Lee, Tae Gyoon Kang, Seokyeoung Jung, Minseok Kwon, Yeona Hong, Jungin Lee, Kyoung-Gu Woo, Ho-Gyeong Kim, Jiseung Jeong, Jihyun Lee, Hosik Lee, Young Sang Choi
Abstract summary: The proposed model can reuse the full fine-tuned LM which is fine-tuned using all layers of an original model. The proposed model is also effective in reducing the model maintenance cost because it is possible to omit the costly and time-consuming common LM pre-training process.
Score: 16.8397357399749
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose an adapter based multi-domain Transformer based language model (LM) for Transformer ASR. The model consists of a big size common LM and small size adapters. The model can perform multi-domain adaptation with only the small size adapters and its related layers. The proposed model can reuse the full fine-tuned LM which is fine-tuned using all layers of an original model. The proposed LM can be expanded to new domains by adding about 2% of parameters for a first domain and 13% parameters for after second domain. The proposed model is also effective in reducing the model maintenance cost because it is possible to omit the costly and time-consuming common LM pre-training process. Using proposed adapter based approach, we observed that a general LM with adapter can outperform a dedicated music domain LM in terms of word error rate (WER).

Related papers

HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding [67.24430397016275]
We propose a new early-fusion LMM that can fuse multi-modal inputs in the early stage and respond to visual instructions in an auto-regressive manner. The proposed model demonstrates superior performance compared to other LMMs using one transformer and significantly narrows the performance gap with compositional LMMs.
arXiv Detail & Related papers (2025-03-12T06:01:05Z)
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA [38.30350849992281]
"Recursive" language models share parameters across layers with minimal loss of performance. Recursive Transformers are efficiently from standard pretrained Transformers, but only use a single block of unique layers that is then repeated multiple times in a loop. We show that our models outperform both similar-sized vanilla pretrained models and knowledge distillation baselines.
arXiv Detail & Related papers (2024-10-28T02:15:45Z)
Hadamard Adapter: An Extreme Parameter-Efficient Adapter Tuning Method for Pre-trained Language Models [108.08773541490191]
Pre-trained Language models (PLMs) have a huge amount of parameters, fine-tuning them is often expensive and time consuming. It is necessary to adopt a parameter-efficient approach to reduce parameters of PLMs in fine-tuning without compromising their performance in downstream tasks. In this paper, we design a novel adapter which only acts on self-attention outputs in PLMs.
arXiv Detail & Related papers (2024-07-04T18:21:28Z)
Plug-and-Play Transformer Modules for Test-Time Adaptation [54.80435317208111]
We introduce PLUTO: a Plug-and-pLay modUlar Test-time domain adaptatiOn strategy. We pre-train a large set of modules, each specialized for different source domains. We harness multiple most-relevant source domains in a single inference call.
arXiv Detail & Related papers (2024-01-06T00:24:50Z)
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch [72.97553348776425]
We introduce DARE to set most delta parameters without affecting the abilities of Supervised Fine-Tuning (SFT) LMs. Then, we use DARE as a versatile plug-in to sparsify delta parameters of multiple SFT models and merge them into a single model. We also utilize DARE to create a merged LM that ranks first among models with 7 billion parameters on the Open Leaderboard.
arXiv Detail & Related papers (2023-11-06T13:43:07Z)
CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models [43.28607973774104]
Methods for adapting language models (LMs) to new tasks and domains have traditionally assumed white-box access to the model. We present a lightweight method for adapting large LMs to new domains and tasks, assuming no access to their weights or intermediate activations. Our approach fine-tunes a small white-box LM and combines it with the large black-box LM at the probability level through a small network.
arXiv Detail & Related papers (2023-05-23T06:32:55Z)
Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition [2.5680214354539803]
We propose a solution that can reuse the block in Transformer models for the application of ASR on edge devices. Specifically, we design a novel block-reusing strategy for speech Transformer (BRST) to enhance the effectiveness of parameters.
arXiv Detail & Related papers (2023-03-23T06:54:37Z)
AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models [127.04370753583261]
Pretrained language models (PLMs) are trained on massive corpora, but often need to specialize to specific domains. A solution is to use a related-domain adapter for the novel domain at test time. We introduce AdapterSoup, an approach that performs weight-space averaging of adapters trained on different domains.
arXiv Detail & Related papers (2023-02-14T13:09:23Z)
AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters. This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation. We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z)
Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering [75.86788916930377]
bilaterally slimmable Transformer (BST) integrated into arbitrary Transformer-based VQA models. One slimmed MCAN-BST submodel achieves comparable accuracy on VQA-v2. Smallest MCAN-BST submodel has 9M parameters and 0.16G FLOPs during inference.
arXiv Detail & Related papers (2022-03-24T02:26:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.