Fine-Tuning Large Neural Language Models for Biomedical Natural Language
Processing
- URL: http://arxiv.org/abs/2112.07869v1
- Date: Wed, 15 Dec 2021 04:20:35 GMT
- Title: Fine-Tuning Large Neural Language Models for Biomedical Natural Language
Processing
- Authors: Robert Tinn, Hao Cheng, Yu Gu, Naoto Usuyama, Xiaodong Liu, Tristan
Naumann, Jianfeng Gao, Hoifung Poon
- Abstract summary: We conduct a systematic study on fine-tuning stability in biomedical NLP.
We show that finetuning performance may be sensitive to pretraining settings, especially in low-resource domains.
We show that these techniques can substantially improve fine-tuning performance for lowresource biomedical NLP applications.
- Score: 55.52858954615655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivation: A perennial challenge for biomedical researchers and clinical
practitioners is to stay abreast with the rapid growth of publications and
medical notes. Natural language processing (NLP) has emerged as a promising
direction for taming information overload. In particular, large neural language
models facilitate transfer learning by pretraining on unlabeled text, as
exemplified by the successes of BERT models in various NLP applications.
However, fine-tuning such models for an end task remains challenging,
especially with small labeled datasets, which are common in biomedical NLP.
Results: We conduct a systematic study on fine-tuning stability in biomedical
NLP. We show that finetuning performance may be sensitive to pretraining
settings, especially in low-resource domains. Large models have potential to
attain better performance, but increasing model size also exacerbates
finetuning instability. We thus conduct a comprehensive exploration of
techniques for addressing fine-tuning instability. We show that these
techniques can substantially improve fine-tuning performance for lowresource
biomedical NLP applications. Specifically, freezing lower layers is helpful for
standard BERT-BASE models, while layerwise decay is more effective for
BERT-LARGE and ELECTRA models. For low-resource text similarity tasks such as
BIOSSES, reinitializing the top layer is the optimal strategy. Overall,
domainspecific vocabulary and pretraining facilitate more robust models for
fine-tuning. Based on these findings, we establish new state of the art on a
wide range of biomedical NLP applications.
Availability and implementation: To facilitate progress in biomedical NLP, we
release our state-of-the-art pretrained and fine-tuned models:
https://aka.ms/BLURB.
Related papers
- Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - Improving Biomedical Entity Linking with Retrieval-enhanced Learning [53.24726622142558]
$k$NN-BioEL provides a BioEL model with the ability to reference similar instances from the entire training corpus as clues for prediction.
We show that $k$NN-BioEL outperforms state-of-the-art baselines on several datasets.
arXiv Detail & Related papers (2023-12-15T14:04:23Z) - BIOptimus: Pre-training an Optimal Biomedical Language Model with
Curriculum Learning for Named Entity Recognition [0.0]
Using language models (LMs) pre-trained in a self-supervised setting on large corpora has helped to deal with the problem of limited label data.
Recent research in biomedical language processing has offered a number of biomedical LMs pre-trained.
This paper aims to investigate different pre-training methods, such as pre-training the biomedical LM from scratch and pre-training it in a continued fashion.
arXiv Detail & Related papers (2023-08-16T18:48:01Z) - Lightweight Transformers for Clinical Natural Language Processing [9.532776962985828]
This study focuses on development of compact language models for processing clinical texts.
We developed a number of efficient lightweight clinical transformers using knowledge distillation and continual learning.
Our evaluation was done across several standard datasets and covered a wide range of clinical text-mining tasks.
arXiv Detail & Related papers (2023-02-09T16:07:31Z) - Sparse*BERT: Sparse Models Generalize To New tasks and Domains [79.42527716035879]
This paper studies how models pruned using Gradual Unstructured Magnitude Pruning can transfer between domains and tasks.
We demonstrate that our general sparse model Sparse*BERT can become SparseBioBERT simply by pretraining the compressed architecture on unstructured biomedical text.
arXiv Detail & Related papers (2022-05-25T02:51:12Z) - Clinical Prompt Learning with Frozen Language Models [4.077071350659386]
Large but frozen pre-trained language models (PLMs) with prompt learning outperform smaller but fine-tuned models.
We investigated the viability of prompt learning on clinically meaningful decision tasks.
Results are partially in line with the prompt learning literature, with prompt learning able to match or improve on traditional fine-tuning.
arXiv Detail & Related papers (2022-05-11T14:25:13Z) - BERT WEAVER: Using WEight AVERaging to enable lifelong learning for
transformer-based models in biomedical semantic search engines [49.75878234192369]
We present WEAVER, a simple, yet efficient post-processing method that infuses old knowledge into the new model.
We show that applying WEAVER in a sequential manner results in similar word embedding distributions as doing a combined training on all data at once.
arXiv Detail & Related papers (2022-02-21T10:34:41Z) - GPT-3 Models are Poor Few-Shot Learners in the Biomedical Domain [5.479164650793012]
We investigate the performance of two powerful transformer language models, i.e. GPT-3 and BioBERT, in few-shot settings on various biomedical NLP tasks.
GPT-3 had already achieved near state-of-the-art results in few-shot knowledge transfer on open-domain NLP tasks, but it could not perform as effectively as BioBERT.
arXiv Detail & Related papers (2021-09-06T15:50:37Z) - Domain-Specific Language Model Pretraining for Biomedical Natural
Language Processing [73.37262264915739]
We show that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains.
Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks.
arXiv Detail & Related papers (2020-07-31T00:04:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.