GPT-3 Models are Poor Few-Shot Learners in the Biomedical Domain
- URL: http://arxiv.org/abs/2109.02555v1
- Date: Mon, 6 Sep 2021 15:50:37 GMT
- Title: GPT-3 Models are Poor Few-Shot Learners in the Biomedical Domain
- Authors: Milad Moradi, Kathrin Blagec, Florian Haberl, Matthias Samwald
- Abstract summary: We investigate the performance of two powerful transformer language models, i.e. GPT-3 and BioBERT, in few-shot settings on various biomedical NLP tasks.
GPT-3 had already achieved near state-of-the-art results in few-shot knowledge transfer on open-domain NLP tasks, but it could not perform as effectively as BioBERT.
- Score: 5.479164650793012
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deep neural language models have set new breakthroughs in many tasks of
Natural Language Processing (NLP). Recent work has shown that deep transformer
language models (pretrained on large amounts of texts) can achieve high levels
of task-specific few-shot performance comparable to state-of-the-art models.
However, the ability of these large language models in few-shot transfer
learning has not yet been explored in the biomedical domain. We investigated
the performance of two powerful transformer language models, i.e. GPT-3 and
BioBERT, in few-shot settings on various biomedical NLP tasks. The experimental
results showed that, to a great extent, both the models underperform a language
model fine-tuned on the full training data. Although GPT-3 had already achieved
near state-of-the-art results in few-shot knowledge transfer on open-domain NLP
tasks, it could not perform as effectively as BioBERT, which is orders of
magnitude smaller than GPT-3. Regarding that BioBERT was already pretrained on
large biomedical text corpora, our study suggests that language models may
largely benefit from in-domain pretraining in task-specific few-shot learning.
However, in-domain pretraining seems not to be sufficient; novel pretraining
and few-shot learning strategies are required in the biomedical NLP domain.
Related papers
- Elsevier Arena: Human Evaluation of Chemistry/Biology/Health Foundational Large Language Models [0.038696580294804606]
We describe a human evaluation experiment focused on the biomedical domain (health, biology, chemistry/pharmacology) carried out at Elsevier.
In it a large but not massive (8.8B parameter) decoder-only foundational transformer trained on a relatively small (135B tokens) dataset is compared to OpenAI's GPT-3.5-turbo and Meta's foundational 7B parameter Llama 2 model.
Results indicate -- even if IRR scores were generally low -- a preference towards GPT-3.5-turbo, and hence towards models that possess conversational abilities, are very large and were trained on very large datasets.
arXiv Detail & Related papers (2024-09-09T10:30:00Z) - How Important is Domain Specificity in Language Models and Instruction
Finetuning for Biomedical Relation Extraction? [1.7555695340815782]
General-domain models typically outperformed biomedical-domain models.
biomedical instruction finetuning improved performance to a similar degree as general instruction finetuning.
Our findings suggest it may be more fruitful to focus research effort on larger-scale biomedical instruction finetuning of general LMs.
arXiv Detail & Related papers (2024-02-21T01:57:58Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - BioGPT: Generative Pre-trained Transformer for Biomedical Text
Generation and Mining [140.61707108174247]
We propose BioGPT, a domain-specific generative Transformer language model pre-trained on large scale biomedical literature.
We get 44.98%, 38.42% and 40.76% F1 score on BC5CDR, KD-DTI and DDI end-to-end relation extraction tasks respectively, and 78.2% accuracy on PubMedQA.
arXiv Detail & Related papers (2022-10-19T07:17:39Z) - Sparse*BERT: Sparse Models Generalize To New tasks and Domains [79.42527716035879]
This paper studies how models pruned using Gradual Unstructured Magnitude Pruning can transfer between domains and tasks.
We demonstrate that our general sparse model Sparse*BERT can become SparseBioBERT simply by pretraining the compressed architecture on unstructured biomedical text.
arXiv Detail & Related papers (2022-05-25T02:51:12Z) - Fine-Tuning Large Neural Language Models for Biomedical Natural Language
Processing [55.52858954615655]
We conduct a systematic study on fine-tuning stability in biomedical NLP.
We show that finetuning performance may be sensitive to pretraining settings, especially in low-resource domains.
We show that these techniques can substantially improve fine-tuning performance for lowresource biomedical NLP applications.
arXiv Detail & Related papers (2021-12-15T04:20:35Z) - Domain-Specific Language Model Pretraining for Biomedical Natural
Language Processing [73.37262264915739]
We show that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains.
Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks.
arXiv Detail & Related papers (2020-07-31T00:04:15Z) - Language Models are Few-Shot Learners [61.36677350504291]
We show that scaling up language models greatly improves task-agnostic, few-shot performance.
We train GPT-3, an autoregressive language model with 175 billion parameters, and test its performance in the few-shot setting.
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks.
arXiv Detail & Related papers (2020-05-28T17:29:03Z) - An Empirical Study of Multi-Task Learning on BERT for Biomedical Text
Mining [17.10823632511911]
We study a multi-task learning model with multiple decoders on varieties of biomedical and clinical natural language processing tasks.
Our empirical results demonstrate that the MTL fine-tuned models outperform state-of-the-art transformer models.
arXiv Detail & Related papers (2020-05-06T13:25:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.