Domain Generalization Using Large Pretrained Models with
Mixture-of-Adapters
- URL: http://arxiv.org/abs/2310.11031v1
- Date: Tue, 17 Oct 2023 07:01:24 GMT
- Title: Domain Generalization Using Large Pretrained Models with
Mixture-of-Adapters
- Authors: Gyuseong Lee, Wooseok Jang, Jin Hyeon Kim, Jaewoo Jung, Seungryong Kim
- Abstract summary: Domain generalization (DG) algorithm aims to maintain the performance of a trained model on different distributions.
We propose a mixture-of-expert based adapter fine-tuning method, dubbed as mixture-of-adapters (MoA)
- Score: 35.834509022013435
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning a robust vision model despite large distribution shift is essential
for model deployment in real-world settings. Especially, domain generalization
(DG) algorithm aims to maintain the performance of a trained model on different
distributions which were not seen during training. One of the most effective
methods has been leveraging the already learned rich knowledge of large
pretrained models. However, naively fine-tuning large models to DG tasks is
often practically infeasible due to memory limitations, extensive time
requirements for training, and the risk of learned knowledge deterioration.
Recently, parameter-efficient fine-tuning (PEFT) methods have been proposed to
reduce the high computational cost during training and efficiently adapt large
models to downstream tasks. In this work, for the first time, we find that the
use of adapters in PEFT methods not only reduce high computational cost during
training but also serve as an effective regularizer for DG tasks. Surprisingly,
a naive adapter implementation for large models achieve superior performance on
common datasets. However, in situations of large distribution shifts,
additional factors such as optimal amount of regularization due to the strength
of distribution shifts should be considered for a sophisticated adapter
implementation. To address this, we propose a mixture-of-expert based adapter
fine-tuning method, dubbed as mixture-of-adapters (MoA). Specifically, we
employ multiple adapters that have varying capacities, and by using learnable
routers, we allocate each token to a proper adapter. By using both PEFT and MoA
methods, we effectively alleviate the performance deterioration caused by
distribution shifts and achieve state-of-the-art performance on diverse DG
benchmarks.
Related papers
- Adaptive Adapter Routing for Long-Tailed Class-Incremental Learning [55.384428765798496]
New data exhibits a long-tailed distribution, such as e-commerce platform reviews.
This necessitates continuous model learning imbalanced data without forgetting.
We introduce AdaPtive Adapter RouTing (APART) as an exemplar-free solution for LTCIL.
arXiv Detail & Related papers (2024-09-11T17:52:00Z) - FL-TAC: Enhanced Fine-Tuning in Federated Learning via Low-Rank, Task-Specific Adapter Clustering [12.417857960556155]
Federated Learning (FL) offers a promising solution by enabling fine-tuning across large-scale clients with a variety of task data.
This paper addresses the high communication cost for fine-tuning large pre-trained models within FL frameworks through low-rank fine-tuning.
arXiv Detail & Related papers (2024-04-23T10:50:38Z) - PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation [61.57833648734164]
We propose a novel Parallel Yielding Re-Activation (PYRA) method for training-inference efficient task adaptation.
PYRA outperforms all competing methods under both low compression rate and high compression rate.
arXiv Detail & Related papers (2024-03-14T09:06:49Z) - TAIL: Task-specific Adapters for Imitation Learning with Large
Pretrained Models [32.83440439290383]
We introduce TAIL (Task-specific Adapters for Learning), a framework for efficient adaptation to new control tasks.
Inspired by recent advancements in parameter-efficient fine-tuning in language domains, we explore efficient fine-tuning techniques.
Our experiments in large-scale language-conditioned manipulation tasks suggest that TAIL with LoRA can achieve the best post-adaptation performance.
arXiv Detail & Related papers (2023-10-09T17:49:50Z) - MerA: Merging Pretrained Adapters For Few-Shot Learning [71.44422347502409]
We propose textbftextttMerging Pretrained Adapters (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion.
Experiments on two PLMs demonstrate that MerA substantial improvements compared to both single adapters and AdapterFusion.
arXiv Detail & Related papers (2023-08-30T12:10:17Z) - A Comprehensive Analysis of Adapter Efficiency [20.63580880344425]
We show that for Natural Language Understanding (NLU) tasks, the parameter efficiency in adapters does not translate to efficiency gains compared to full fine-tuning of models.
We recommend that for moderately sized models for NLU tasks, practitioners should rely on full fine-tuning or multi-task training rather than using adapters.
arXiv Detail & Related papers (2023-05-12T14:05:45Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - To Adapt or to Fine-tune: A Case Study on Abstractive Summarization [7.353994554197792]
Recent advances in the field of abstractive summarization leverage pre-trained language models rather than train a model from scratch.
Such models are sluggish to train and accompanied by a massive overhead.
It remains uncertain whether using adapters benefits the task of summarization, in terms of improved efficiency without an unpleasant sacrifice in performance.
arXiv Detail & Related papers (2022-08-30T22:48:28Z) - AdapterBias: Parameter-efficient Token-dependent Representation Shift
for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage.
Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters.
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.