Domain Generalization Using Large Pretrained Models with Mixture-of-Adapters
- URL: http://arxiv.org/abs/2310.11031v2
- Date: Sat, 07 Dec 2024 06:57:05 GMT
- Title: Domain Generalization Using Large Pretrained Models with Mixture-of-Adapters
- Authors: Gyuseong Lee, Wooseok Jang, Jinhyeon Kim, Jaewoo Jung, Seungryong Kim,
- Abstract summary: This study is on leveraging the knowledge of large pretrained models to improve handling of OOD scenarios and tackle domain generalization problems.<n>We employ parameter-efficient fine-tuning (PEFT) techniques to effectively preserve OOD robustness while working with large models.<n>Our experiments and analysis confirm that the most effective approaches involve ensembling diverse models and increasing the scale of pretraining.
- Score: 33.401355417911084
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning robust vision models that perform well in out-of-distribution (OOD) situations is an important task for model deployment in real-world settings. Despite extensive research in this field, many proposed methods have only shown minor performance improvements compared to the simplest empirical risk minimization (ERM) approach, which was evaluated on a benchmark with a limited hyperparameter search space. Our focus in this study is on leveraging the knowledge of large pretrained models to improve handling of OOD scenarios and tackle domain generalization problems. However, prior research has revealed that naively fine-tuning a large pretrained model can impair OOD robustness. Thus, we employ parameter-efficient fine-tuning (PEFT) techniques to effectively preserve OOD robustness while working with large models. Our extensive experiments and analysis confirm that the most effective approaches involve ensembling diverse models and increasing the scale of pretraining. As a result, we achieve state-of-the-art performance in domain generalization tasks. Our code and project page are available at: https://cvlab-kaist.github.io/MoA
Related papers
- Adaptive Adapter Routing for Long-Tailed Class-Incremental Learning [55.384428765798496]
New data exhibits a long-tailed distribution, such as e-commerce platform reviews.
This necessitates continuous model learning imbalanced data without forgetting.
We introduce AdaPtive Adapter RouTing (APART) as an exemplar-free solution for LTCIL.
arXiv Detail & Related papers (2024-09-11T17:52:00Z) - Feature Protection For Out-of-distribution Generalization [24.072876186625855]
We show that protecting pre-trained features leads to a fine-tuned model more robust to generalization.
We show that protecting pre-trained features leads to a fine-tuned model more robust to OOD generalization.
arXiv Detail & Related papers (2024-05-25T03:00:06Z) - FL-TAC: Enhanced Fine-Tuning in Federated Learning via Low-Rank, Task-Specific Adapter Clustering [12.417857960556155]
Federated Learning (FL) offers a promising solution by enabling fine-tuning across large-scale clients with a variety of task data.
This paper addresses the high communication cost for fine-tuning large pre-trained models within FL frameworks through low-rank fine-tuning.
arXiv Detail & Related papers (2024-04-23T10:50:38Z) - PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation [61.57833648734164]
We propose a novel Parallel Yielding Re-Activation (PYRA) method for training-inference efficient task adaptation.
PYRA outperforms all competing methods under both low compression rate and high compression rate.
arXiv Detail & Related papers (2024-03-14T09:06:49Z) - Efficiency at Scale: Investigating the Performance of Diminutive
Language Models in Clinical Tasks [2.834743715323873]
We present an investigation into the suitability of different PEFT methods to clinical decision-making tasks.
Our analysis shows that the performance of most PEFT approaches varies significantly from one task to another.
The effectiveness of PEFT methods in the clinical domain is evident, particularly for specialised models which can operate on low-cost, in-house computing infrastructure.
arXiv Detail & Related papers (2024-02-16T11:30:11Z) - Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks.
However, the massive size of these models poses huge challenges for their deployment in real-world applications.
We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z) - TAIL: Task-specific Adapters for Imitation Learning with Large
Pretrained Models [32.83440439290383]
We introduce TAIL (Task-specific Adapters for Learning), a framework for efficient adaptation to new control tasks.
Inspired by recent advancements in parameter-efficient fine-tuning in language domains, we explore efficient fine-tuning techniques.
Our experiments in large-scale language-conditioned manipulation tasks suggest that TAIL with LoRA can achieve the best post-adaptation performance.
arXiv Detail & Related papers (2023-10-09T17:49:50Z) - MerA: Merging Pretrained Adapters For Few-Shot Learning [71.44422347502409]
We propose textbftextttMerging Pretrained Adapters (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion.
Experiments on two PLMs demonstrate that MerA substantial improvements compared to both single adapters and AdapterFusion.
arXiv Detail & Related papers (2023-08-30T12:10:17Z) - A Comprehensive Analysis of Adapter Efficiency [20.63580880344425]
We show that for Natural Language Understanding (NLU) tasks, the parameter efficiency in adapters does not translate to efficiency gains compared to full fine-tuning of models.
We recommend that for moderately sized models for NLU tasks, practitioners should rely on full fine-tuning or multi-task training rather than using adapters.
arXiv Detail & Related papers (2023-05-12T14:05:45Z) - Model Agnostic Sample Reweighting for Out-of-Distribution Learning [38.843552982739354]
We propose a principled method, textbfAgnostic samtextbfPLe rtextbfEweighting (textbfMAPLE) to effectively address OOD problem.
Our key idea is to find an effective reweighting of the training samples so that the standard empirical risk minimization training of a large model leads to superior OOD generalization performance.
arXiv Detail & Related papers (2023-01-24T05:11:03Z) - Are Sample-Efficient NLP Models More Robust? [90.54786862811183]
We investigate the relationship between sample efficiency (amount of data needed to reach a given ID accuracy) and robustness (how models fare on OOD evaluation)
We find that higher sample efficiency is only correlated with better average OOD robustness on some modeling interventions and tasks, but not others.
These results suggest that general-purpose methods for improving sample efficiency are unlikely to yield universal OOD robustness improvements, since such improvements are highly dataset- and task-dependent.
arXiv Detail & Related papers (2022-10-12T17:54:59Z) - SimSCOOD: Systematic Analysis of Out-of-Distribution Generalization in
Fine-tuned Source Code Models [58.78043959556283]
We study the behaviors of models under different fine-tuning methodologies, including full fine-tuning and Low-Rank Adaptation (LoRA) fine-tuning methods.
Our analysis uncovers that LoRA fine-tuning consistently exhibits significantly better OOD generalization performance than full fine-tuning across various scenarios.
arXiv Detail & Related papers (2022-10-10T16:07:24Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - To Adapt or to Fine-tune: A Case Study on Abstractive Summarization [7.353994554197792]
Recent advances in the field of abstractive summarization leverage pre-trained language models rather than train a model from scratch.
Such models are sluggish to train and accompanied by a massive overhead.
It remains uncertain whether using adapters benefits the task of summarization, in terms of improved efficiency without an unpleasant sacrifice in performance.
arXiv Detail & Related papers (2022-08-30T22:48:28Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - AdapterBias: Parameter-efficient Token-dependent Representation Shift
for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage.
Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters.
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z) - Sample-Efficient Reinforcement Learning via Conservative Model-Based
Actor-Critic [67.00475077281212]
Model-based reinforcement learning algorithms are more sample efficient than their model-free counterparts.
We propose a novel approach that achieves high sample efficiency without the strong reliance on accurate learned models.
We show that CMBAC significantly outperforms state-of-the-art approaches in terms of sample efficiency on several challenging tasks.
arXiv Detail & Related papers (2021-12-16T15:33:11Z) - Sample Efficient Reinforcement Learning via Model-Ensemble Exploration
and Exploitation [3.728946517493471]
MEEE is a model-ensemble method that consists of optimistic exploration and weighted exploitation.
Our approach outperforms other model-free and model-based state-of-the-art methods, especially in sample complexity.
arXiv Detail & Related papers (2021-07-05T07:18:20Z) - Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual
Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms.
We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance.
We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.