Domain Generalization Using Large Pretrained Models with
Mixture-of-Adapters
- URL: http://arxiv.org/abs/2310.11031v1
- Date: Tue, 17 Oct 2023 07:01:24 GMT
- Title: Domain Generalization Using Large Pretrained Models with
Mixture-of-Adapters
- Authors: Gyuseong Lee, Wooseok Jang, Jin Hyeon Kim, Jaewoo Jung, Seungryong Kim
- Abstract summary: Domain generalization (DG) algorithm aims to maintain the performance of a trained model on different distributions.
We propose a mixture-of-expert based adapter fine-tuning method, dubbed as mixture-of-adapters (MoA)
- Score: 35.834509022013435
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning a robust vision model despite large distribution shift is essential
for model deployment in real-world settings. Especially, domain generalization
(DG) algorithm aims to maintain the performance of a trained model on different
distributions which were not seen during training. One of the most effective
methods has been leveraging the already learned rich knowledge of large
pretrained models. However, naively fine-tuning large models to DG tasks is
often practically infeasible due to memory limitations, extensive time
requirements for training, and the risk of learned knowledge deterioration.
Recently, parameter-efficient fine-tuning (PEFT) methods have been proposed to
reduce the high computational cost during training and efficiently adapt large
models to downstream tasks. In this work, for the first time, we find that the
use of adapters in PEFT methods not only reduce high computational cost during
training but also serve as an effective regularizer for DG tasks. Surprisingly,
a naive adapter implementation for large models achieve superior performance on
common datasets. However, in situations of large distribution shifts,
additional factors such as optimal amount of regularization due to the strength
of distribution shifts should be considered for a sophisticated adapter
implementation. To address this, we propose a mixture-of-expert based adapter
fine-tuning method, dubbed as mixture-of-adapters (MoA). Specifically, we
employ multiple adapters that have varying capacities, and by using learnable
routers, we allocate each token to a proper adapter. By using both PEFT and MoA
methods, we effectively alleviate the performance deterioration caused by
distribution shifts and achieve state-of-the-art performance on diverse DG
benchmarks.
Related papers
- Feature Protection For Out-of-distribution Generalization [24.072876186625855]
We show that protecting pre-trained features leads to a fine-tuned model more robust to generalization.
We show that protecting pre-trained features leads to a fine-tuned model more robust to OOD generalization.
arXiv Detail & Related papers (2024-05-25T03:00:06Z) - Efficiency at Scale: Investigating the Performance of Diminutive
Language Models in Clinical Tasks [2.834743715323873]
We present an investigation into the suitability of different PEFT methods to clinical decision-making tasks.
Our analysis shows that the performance of most PEFT approaches varies significantly from one task to another.
The effectiveness of PEFT methods in the clinical domain is evident, particularly for specialised models which can operate on low-cost, in-house computing infrastructure.
arXiv Detail & Related papers (2024-02-16T11:30:11Z) - Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks.
However, the massive size of these models poses huge challenges for their deployment in real-world applications.
We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z) - Model Agnostic Sample Reweighting for Out-of-Distribution Learning [38.843552982739354]
We propose a principled method, textbfAgnostic samtextbfPLe rtextbfEweighting (textbfMAPLE) to effectively address OOD problem.
Our key idea is to find an effective reweighting of the training samples so that the standard empirical risk minimization training of a large model leads to superior OOD generalization performance.
arXiv Detail & Related papers (2023-01-24T05:11:03Z) - Are Sample-Efficient NLP Models More Robust? [90.54786862811183]
We investigate the relationship between sample efficiency (amount of data needed to reach a given ID accuracy) and robustness (how models fare on OOD evaluation)
We find that higher sample efficiency is only correlated with better average OOD robustness on some modeling interventions and tasks, but not others.
These results suggest that general-purpose methods for improving sample efficiency are unlikely to yield universal OOD robustness improvements, since such improvements are highly dataset- and task-dependent.
arXiv Detail & Related papers (2022-10-12T17:54:59Z) - SimSCOOD: Systematic Analysis of Out-of-Distribution Generalization in
Fine-tuned Source Code Models [58.78043959556283]
We study the behaviors of models under different fine-tuning methodologies, including full fine-tuning and Low-Rank Adaptation (LoRA) fine-tuning methods.
Our analysis uncovers that LoRA fine-tuning consistently exhibits significantly better OOD generalization performance than full fine-tuning across various scenarios.
arXiv Detail & Related papers (2022-10-10T16:07:24Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Sample-Efficient Reinforcement Learning via Conservative Model-Based
Actor-Critic [67.00475077281212]
Model-based reinforcement learning algorithms are more sample efficient than their model-free counterparts.
We propose a novel approach that achieves high sample efficiency without the strong reliance on accurate learned models.
We show that CMBAC significantly outperforms state-of-the-art approaches in terms of sample efficiency on several challenging tasks.
arXiv Detail & Related papers (2021-12-16T15:33:11Z) - Sample Efficient Reinforcement Learning via Model-Ensemble Exploration
and Exploitation [3.728946517493471]
MEEE is a model-ensemble method that consists of optimistic exploration and weighted exploitation.
Our approach outperforms other model-free and model-based state-of-the-art methods, especially in sample complexity.
arXiv Detail & Related papers (2021-07-05T07:18:20Z) - Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual
Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms.
We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance.
We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.