AF Adapter: Continual Pretraining for Building Chinese Biomedical
Language Model
- URL: http://arxiv.org/abs/2211.11363v2
- Date: Fri, 20 Oct 2023 02:32:13 GMT
- Title: AF Adapter: Continual Pretraining for Building Chinese Biomedical
Language Model
- Authors: Yongyu Yan, Kui Xue, Xiaoming Shi, Qi Ye, Jingping Liu, Tong Ruan
- Abstract summary: We propose a continual pretraining method for the BERT-based model, named Attention-FFN Adapter.
Its main idea is to introduce a small number of attention heads and hidden units inside each self-attention layer and feed-forward network.
With only about 17% of model parameters trained, AF Adapter achieves 0.6%, 2% gain in performance on average, compared to strong baselines.
- Score: 16.657197699107396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual pretraining is a popular way of building a domain-specific
pretrained language model from a general-domain language model. In spite of its
high efficiency, continual pretraining suffers from catastrophic forgetting,
which may harm the model's performance in downstream tasks. To alleviate the
issue, in this paper, we propose a continual pretraining method for the
BERT-based model, named Attention-FFN Adapter. Its main idea is to introduce a
small number of attention heads and hidden units inside each self-attention
layer and feed-forward network. Furthermore, we train a domain-specific
language model named AF Adapter based RoBERTa for the Chinese biomedical
domain. In experiments, models are applied to downstream tasks for evaluation.
The results demonstrate that with only about 17% of model parameters trained,
AF Adapter achieves 0.6%, 2% gain in performance on average, compared to strong
baselines. Further experimental results show that our method alleviates the
catastrophic forgetting problem by 11% compared to the fine-tuning method.
Related papers
- StochCA: A Novel Approach for Exploiting Pretrained Models with Cross-Attention [2.66269503676104]
We introduce a novel fine-tuning method, called cross-attention (StochCA), specific to Transformer architectures.
This method modifies the Transformer's self-attention mechanism to selectively utilize knowledge from pretrained models during fine-tuning.
Our experimental results show the superiority of StochCA over state-of-the-art approaches in both areas.
arXiv Detail & Related papers (2024-02-25T13:53:49Z) - FDAPT: Federated Domain-adaptive Pre-training for Language Models [15.755622890097941]
This paper tackles the specific case of Domain-Adaptive Pre-Training (DAPT)
We conduct the first comprehensive empirical study to evaluate the performance of Federated Domain-Adaptive Pre-Training (FDAPT)
We propose a novel algorithm, Frozen Federated Domain-Adaptive Pre-Training (FFDAPT)
arXiv Detail & Related papers (2023-07-12T17:04:28Z) - An Empirical Analysis of Parameter-Efficient Methods for Debiasing
Pre-Trained Language Models [55.14405248920852]
We conduct experiments with prefix tuning, prompt tuning, and adapter tuning on different language models and bias types to evaluate their debiasing performance.
We find that the parameter-efficient methods are effective in mitigating gender bias, where adapter tuning is consistently the most effective.
We also find that prompt tuning is more suitable for GPT-2 than BERT, and racial and religious bias is less effective when it comes to racial and religious bias.
arXiv Detail & Related papers (2023-06-06T23:56:18Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - A Study on FGSM Adversarial Training for Neural Retrieval [3.2634122554914]
Neural retrieval models have acquired significant effectiveness gains over the last few years compared to term-based methods.
However, those models may be brittle when faced to typos, distribution shifts or vulnerable to malicious attacks.
We show that one of the most simple adversarial training techniques -- the Fast Gradient Sign Method (FGSM) -- can improve first stage rankers robustness and effectiveness.
arXiv Detail & Related papers (2023-01-25T13:28:54Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - UDALM: Unsupervised Domain Adaptation through Language Modeling [79.73916345178415]
We introduce UDALM, a fine-tuning procedure, using a mixed classification and Masked Language Model loss.
Our experiments show that performance of models trained with the mixed loss scales with the amount of available target data can be effectively used as a stopping criterion.
Our method is evaluated on twelve domain pairs of the Amazon Reviews Sentiment dataset, yielding $91.74%$ accuracy, which is an $1.11%$ absolute improvement over the state-of-versathe-art.
arXiv Detail & Related papers (2021-04-14T19:05:01Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z) - Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [81.99843216550306]
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks.
A second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains.
Adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.
arXiv Detail & Related papers (2020-04-23T04:21:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.