Lightweight Transformers for Clinical Natural Language Processing
- URL: http://arxiv.org/abs/2302.04725v1
- Date: Thu, 9 Feb 2023 16:07:31 GMT
- Title: Lightweight Transformers for Clinical Natural Language Processing
- Authors: Omid Rohanian, Mohammadmahdi Nouriborji, Hannah Jauncey, Samaneh
Kouchaki, ISARIC Clinical Characterisation Group, Lei Clifton, Laura Merson,
David A. Clifton
- Abstract summary: This study focuses on development of compact language models for processing clinical texts.
We developed a number of efficient lightweight clinical transformers using knowledge distillation and continual learning.
Our evaluation was done across several standard datasets and covered a wide range of clinical text-mining tasks.
- Score: 9.532776962985828
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Specialised pre-trained language models are becoming more frequent in NLP
since they can potentially outperform models trained on generic texts. BioBERT
and BioClinicalBERT are two examples of such models that have shown promise in
medical NLP tasks. Many of these models are overparametrised and
resource-intensive, but thanks to techniques like Knowledge Distillation (KD),
it is possible to create smaller versions that perform almost as well as their
larger counterparts. In this work, we specifically focus on development of
compact language models for processing clinical texts (i.e. progress notes,
discharge summaries etc). We developed a number of efficient lightweight
clinical transformers using knowledge distillation and continual learning, with
the number of parameters ranging from 15 million to 65 million. These models
performed comparably to larger models such as BioBERT and ClinicalBioBERT and
significantly outperformed other compact models trained on general or
biomedical data. Our extensive evaluation was done across several standard
datasets and covered a wide range of clinical text-mining tasks, including
Natural Language Inference, Relation Extraction, Named Entity Recognition, and
Sequence Classification. To our knowledge, this is the first comprehensive
study specifically focused on creating efficient and compact transformers for
clinical NLP tasks. The models and code used in this study can be found on our
Huggingface profile at https://huggingface.co/nlpie and Github page at
https://github.com/nlpie-research/Lightweight-Clinical-Transformers,
respectively, promoting reproducibility of our results.
Related papers
- Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Do We Still Need Clinical Language Models? [15.023633270864675]
We show that relatively small specialized clinical models substantially outperform all in-context learning approaches.
We release the code and the models used under the PhysioNet Credentialed Health Data license and data use agreement.
arXiv Detail & Related papers (2023-02-16T05:08:34Z) - A Comparative Study of Pretrained Language Models for Long Clinical Text [4.196346055173027]
We introduce two domain enriched language models, Clinical-Longformer and Clinical-BigBird, which are pre-trained on a large-scale clinical corpus.
We evaluate both language models using 10 baseline tasks including named entity recognition, question answering, natural language inference, and document classification tasks.
arXiv Detail & Related papers (2023-01-27T16:50:29Z) - On the Effectiveness of Compact Biomedical Transformers [12.432191400869002]
Language models pre-trained on biomedical corpora have recently shown promising results on downstream biomedical tasks.
Many existing pre-trained models are resource-intensive and computationally heavy owing to factors such as embedding size, hidden dimension, and number of layers.
We introduce six lightweight models, namely, BioDistilBERT, BioTinyBERT, BioMobileBERT, DistilBioBERT, TinyBioBERT, and CompactBioBERT.
We evaluate all of our models on three biomedical tasks and compare them with BioBERT-v1.1 to create efficient lightweight models that perform on par with their larger counterparts.
arXiv Detail & Related papers (2022-09-07T14:24:04Z) - Sparse*BERT: Sparse Models Generalize To New tasks and Domains [79.42527716035879]
This paper studies how models pruned using Gradual Unstructured Magnitude Pruning can transfer between domains and tasks.
We demonstrate that our general sparse model Sparse*BERT can become SparseBioBERT simply by pretraining the compressed architecture on unstructured biomedical text.
arXiv Detail & Related papers (2022-05-25T02:51:12Z) - Clinical-Longformer and Clinical-BigBird: Transformers for long clinical
sequences [4.196346055173027]
Transformers-based models, such as BERT, have dramatically improved the performance for various natural language processing tasks.
One of the core limitations of these transformers is the substantial memory consumption due to their full self-attention mechanism.
We introduce two domain enriched language models, namely Clinical-Longformer and Clinical-BigBird, which are pre-trained from large-scale clinical corpora.
arXiv Detail & Related papers (2022-01-27T22:51:58Z) - Fine-Tuning Large Neural Language Models for Biomedical Natural Language
Processing [55.52858954615655]
We conduct a systematic study on fine-tuning stability in biomedical NLP.
We show that finetuning performance may be sensitive to pretraining settings, especially in low-resource domains.
We show that these techniques can substantially improve fine-tuning performance for lowresource biomedical NLP applications.
arXiv Detail & Related papers (2021-12-15T04:20:35Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - An Empirical Study of Multi-Task Learning on BERT for Biomedical Text
Mining [17.10823632511911]
We study a multi-task learning model with multiple decoders on varieties of biomedical and clinical natural language processing tasks.
Our empirical results demonstrate that the MTL fine-tuned models outperform state-of-the-art transformer models.
arXiv Detail & Related papers (2020-05-06T13:25:21Z) - MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
of Pre-Trained Transformers [117.67424061746247]
We present a simple and effective approach to compress large Transformer based pre-trained models.
We propose distilling the self-attention module of the last Transformer layer of the teacher, which is effective and flexible for the student.
Experimental results demonstrate that our monolingual model outperforms state-of-the-art baselines in different parameter size of student models.
arXiv Detail & Related papers (2020-02-25T15:21:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.