Distilling Large Language Models for Efficient Clinical Information Extraction
- URL: http://arxiv.org/abs/2501.00031v1
- Date: Sat, 21 Dec 2024 02:15:29 GMT
- Title: Distilling Large Language Models for Efficient Clinical Information Extraction
- Authors: Karthik S. Vedula, Annika Gupta, Akshay Swaminathan, Ivan Lopez, Suhana Bedi, Nigam H. Shah,
- Abstract summary: We evaluate the performance of distilled BERT models, which are approximately 1,000 times smaller than modern LLMs.<n>We leveraged state-of-the-art LLMs (Gemini and OpenAI models) and medical (RxNorm and SNOMED) as teacher labelers for medication, disease, and symptom extraction.<n>We applied our approach to over 3,300 clinical notes spanning five publicly available datasets.
- Score: 2.953317125529822
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) excel at clinical information extraction but their computational demands limit practical deployment. Knowledge distillation--the process of transferring knowledge from larger to smaller models--offers a potential solution. We evaluate the performance of distilled BERT models, which are approximately 1,000 times smaller than modern LLMs, for clinical named entity recognition (NER) tasks. We leveraged state-of-the-art LLMs (Gemini and OpenAI models) and medical ontologies (RxNorm and SNOMED) as teacher labelers for medication, disease, and symptom extraction. We applied our approach to over 3,300 clinical notes spanning five publicly available datasets, comparing distilled BERT models against both their teacher labelers and BERT models fine-tuned on human labels. External validation was conducted using clinical notes from the MedAlign dataset. For disease extraction, F1 scores were 0.82 (teacher model), 0.89 (BioBERT trained on human labels), and 0.84 (BioBERT-distilled). For medication, F1 scores were 0.84 (teacher model), 0.91 (BioBERT-human), and 0.87 (BioBERT-distilled). For symptoms: F1 score of 0.73 (teacher model) and 0.68 (BioBERT-distilled). Distilled BERT models had faster inference (12x, 4x, 8x faster than GPT-4o, o1-mini, and Gemini Flash respectively) and lower costs (85x, 101x, 2x cheaper than GPT-4o, o1-mini, and Gemini Flash respectively). On the external validation dataset, the distilled BERT model achieved F1 scores of 0.883 (medication), 0.726 (disease), and 0.699 (symptom). Distilled BERT models were up to 101x cheaper and 12x faster than state-of-the-art LLMs while achieving similar performance on NER tasks. Distillation offers a computationally efficient and scalable alternative to large LLMs for clinical information extraction.
Related papers
- Larger models yield better results? Streamlined severity classification of ADHD-related concerns using BERT-based knowledge distillation [0.6793286055326242]
We create a lightweight yet powerful BERT based model for natural language processing applications.
We apply the resulting model, LastBERT, to a real-world task classifying severity levels of Attention Deficit Hyperactivity Disorder (ADHD)-related concerns from social media text data.
arXiv Detail & Related papers (2024-10-30T17:57:44Z) - Iterative Prompt Refinement for Radiation Oncology Symptom Extraction
Using Teacher-Student Large Language Models [1.3137489010086167]
Mixtral, the student model, initially extracts symptoms, followed by GPT-4, the teacher model, which refines prompts based on Mixtral's performance.
Results showed significant improvements in extracting symptoms from both single and multi-symptom notes.
arXiv Detail & Related papers (2024-02-06T15:25:09Z) - Distilling Large Language Models for Biomedical Knowledge Extraction: A
Case Study on Adverse Drug Events [17.73671383380315]
We study how large language models (LLMs) can be used to scale biomedical knowledge curation.
We find that substantial gains can be attained over out-of-box LLMs, with additional advantages such as cost, efficiency, and white-box model access.
arXiv Detail & Related papers (2023-07-12T20:08:48Z) - Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset [0.08192907805418582]
Transformer-based models have shown outstanding results in natural language processing but face challenges in applications like classifying small-scale clinical texts.
This study presents a customized Mixture of Expert (MoE) Transformer models for classifying small-scale French clinical texts at CHU Sainte-Justine Hospital.
arXiv Detail & Related papers (2023-03-22T20:10:29Z) - Exploring the Value of Pre-trained Language Models for Clinical Named
Entity Recognition [6.917786124918387]
We compare Transformer models that are trained from scratch to fine-tuned BERT-based LLMs.
We examine the impact of an additional CRF layer on such models to encourage contextual learning.
arXiv Detail & Related papers (2022-10-23T16:27:31Z) - Self-supervised contrastive learning of echocardiogram videos enables
label-efficient cardiac disease diagnosis [48.64462717254158]
We developed a self-supervised contrastive learning approach, EchoCLR, to catered to echocardiogram videos.
When fine-tuned on small portions of labeled data, EchoCLR pretraining significantly improved classification performance for left ventricular hypertrophy (LVH) and aortic stenosis (AS)
EchoCLR is unique in its ability to learn representations of medical videos and demonstrates that SSL can enable label-efficient disease classification from small, labeled datasets.
arXiv Detail & Related papers (2022-07-23T19:17:26Z) - ADT-SSL: Adaptive Dual-Threshold for Semi-Supervised Learning [68.53717108812297]
Semi-Supervised Learning (SSL) has advanced classification tasks by inputting both labeled and unlabeled data to train a model jointly.
This paper proposes an Adaptive Dual-Threshold method for Semi-Supervised Learning (ADT-SSL)
Experimental results show that the proposed ADT-SSL achieves state-of-the-art classification accuracy.
arXiv Detail & Related papers (2022-05-21T11:52:08Z) - Fine-Tuning Large Neural Language Models for Biomedical Natural Language
Processing [55.52858954615655]
We conduct a systematic study on fine-tuning stability in biomedical NLP.
We show that finetuning performance may be sensitive to pretraining settings, especially in low-resource domains.
We show that these techniques can substantially improve fine-tuning performance for lowresource biomedical NLP applications.
arXiv Detail & Related papers (2021-12-15T04:20:35Z) - EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up
Knowledge Distillation [82.3956677850676]
Pre-trained language models have shown remarkable results on various NLP tasks.
Due to their bulky size and slow inference speed, it is hard to deploy them on edge devices.
In this paper, we have a critical insight that improving the feed-forward network (FFN) in BERT has a higher gain than improving the multi-head attention (MHA)
arXiv Detail & Related papers (2021-09-15T11:25:39Z) - Does BERT Pretrained on Clinical Notes Reveal Sensitive Data? [70.3631443249802]
We design a battery of approaches intended to recover Personal Health Information from a trained BERT.
Specifically, we attempt to recover patient names and conditions with which they are associated.
We find that simple probing methods are not able to meaningfully extract sensitive information from BERT trained over the MIMIC-III corpus of EHR.
arXiv Detail & Related papers (2021-04-15T20:40:05Z) - Pre-trained Summarization Distillation [121.14806854092672]
Recent work on distilling BERT for classification and regression tasks shows strong performance using direct knowledge distillation.
Alternatively, machine translation practitioners distill using pseudo-labeling, where a small model is trained on the translations of a larger model.
A third, simpler approach is to'shrink and fine-tune' (SFT), which avoids any explicit distillation by copying parameters to a smaller student model and then fine-tuning.
arXiv Detail & Related papers (2020-10-24T23:15:43Z) - TernaryBERT: Distillation-aware Ultra-low Bit BERT [53.06741585060951]
We propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model.
Experiments on the GLUE benchmark and SQuAD show that our proposed TernaryBERT outperforms the other BERT quantization methods.
arXiv Detail & Related papers (2020-09-27T10:17:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.