FDAPT: Federated Domain-adaptive Pre-training for Language Models
- URL: http://arxiv.org/abs/2307.06933v2
- Date: Thu, 9 Nov 2023 16:57:47 GMT
- Title: FDAPT: Federated Domain-adaptive Pre-training for Language Models
- Authors: Lekang Jiang, Filip Svoboda, Nicholas D. Lane
- Abstract summary: This paper tackles the specific case of Domain-Adaptive Pre-Training (DAPT)
We conduct the first comprehensive empirical study to evaluate the performance of Federated Domain-Adaptive Pre-Training (FDAPT)
We propose a novel algorithm, Frozen Federated Domain-Adaptive Pre-Training (FFDAPT)
- Score: 15.755622890097941
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Foundation models (FMs) have shown prominent success in a wide range of
tasks. Their applicability to specific domain-task pairings relies on the
availability of, both, high-quality data and significant computational
resources. These challenges are not new to the field and, indeed, Federated
Learning (FL) has been shown to be a promising solution in similar setups. This
paper tackles the specific case of Domain-Adaptive Pre-Training (DAPT), a key
step in the application of FMs. We conduct the first comprehensive empirical
study to evaluate the performance of Federated Domain-Adaptive Pre-Training
(FDAPT). We demonstrate that FDAPT can maintain competitive downstream task
performance to the centralized baseline in both IID and non-IID situations.
Finally, we propose a novel algorithm, Frozen Federated Domain-Adaptive
Pre-Training (FFDAPT). FFDAPT improves the computational efficiency by 12.1% on
average and exhibits similar downstream task performance to vanilla FDAPT, with
general performance fluctuations remaining less than 1%.
Related papers
- Unveiling the Superior Paradigm: A Comparative Study of Source-Free Domain Adaptation and Unsupervised Domain Adaptation [52.36436121884317]
We show that Source-Free Domain Adaptation (SFDA) generally outperforms Unsupervised Domain Adaptation (UDA) in real-world scenarios.
SFDA offers advantages in time efficiency, storage requirements, targeted learning objectives, reduced risk of negative transfer, and increased robustness against overfitting.
We propose a novel weight estimation method that effectively integrates available source data into multi-SFDA approaches.
arXiv Detail & Related papers (2024-11-24T13:49:29Z) - Better Practices for Domain Adaptation [62.70267990659201]
Domain adaptation (DA) aims to provide frameworks for adapting models to deployment data without using labels.
Unclear validation protocol for DA has led to bad practices in the literature.
We show challenges across all three branches of domain adaptation methodology.
arXiv Detail & Related papers (2023-09-07T17:44:18Z) - Open-Set Domain Adaptation with Visual-Language Foundation Models [51.49854335102149]
Unsupervised domain adaptation (UDA) has proven to be very effective in transferring knowledge from a source domain to a target domain with unlabeled data.
Open-set domain adaptation (ODA) has emerged as a potential solution to identify these classes during the training phase.
arXiv Detail & Related papers (2023-07-30T11:38:46Z) - Key Design Choices for Double-Transfer in Source-Free Unsupervised
Domain Adaptation [18.21955526087808]
This paper provides the first in-depth analysis of the main design choices in Source-Free Unsupervised Domain Adaptation (SF-UDA)
We pinpoint the normalization approach, pre-training strategy, and backbone architecture as the most critical factors.
We show that SF-UDA is competitive also beyond standard benchmarks and backbone architectures, performing on par with UDA at a fraction of the data and computational cost.
arXiv Detail & Related papers (2023-02-10T17:00:37Z) - AF Adapter: Continual Pretraining for Building Chinese Biomedical
Language Model [16.657197699107396]
We propose a continual pretraining method for the BERT-based model, named Attention-FFN Adapter.
Its main idea is to introduce a small number of attention heads and hidden units inside each self-attention layer and feed-forward network.
With only about 17% of model parameters trained, AF Adapter achieves 0.6%, 2% gain in performance on average, compared to strong baselines.
arXiv Detail & Related papers (2022-11-21T11:30:13Z) - Domain Adaptation with Adversarial Training on Penultimate Activations [82.9977759320565]
Enhancing model prediction confidence on unlabeled target data is an important objective in Unsupervised Domain Adaptation (UDA)
We show that this strategy is more efficient and better correlated with the objective of boosting prediction confidence than adversarial training on input images or intermediate features.
arXiv Detail & Related papers (2022-08-26T19:50:46Z) - Optimizing Two-way Partial AUC with an End-to-end Framework [154.47590401735323]
Area Under the ROC Curve (AUC) is a crucial metric for machine learning.
Recent work shows that the TPAUC is essentially inconsistent with the existing Partial AUC metrics.
We present the first trial in this paper to optimize this new metric.
arXiv Detail & Related papers (2022-06-23T12:21:30Z) - Knowledge Distillation for BERT Unsupervised Domain Adaptation [2.969705152497174]
A pre-trained language model, BERT, has brought significant performance improvements across a range of natural language processing tasks.
We propose a simple but effective unsupervised domain adaptation method, adversarial adaptation with distillation (AAD)
We evaluate our approach in the task of cross-domain sentiment classification on 30 domain pairs.
arXiv Detail & Related papers (2020-10-22T06:51:24Z) - Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [81.99843216550306]
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks.
A second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains.
Adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.
arXiv Detail & Related papers (2020-04-23T04:21:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.