MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
- URL: http://arxiv.org/abs/2403.00952v1
- Date: Fri, 1 Mar 2024 20:03:44 GMT
- Title: MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
- Authors: Vithursan Thangarasa, Mahmoud Salem, Shreyas Saxena, Kevin Leong, Joel
Hestness, Sean Lie
- Abstract summary: MediSwift is a suite of biomedical LMs that leverage sparse pre-training on domain-specific biomedical text data.
By inducing up to 75% weight sparsity during the pre-training phase, MediSwift achieves a 2-2.5x reduction in training FLOPs.
Our results show that sparse pre-training, along with dense fine-tuning and soft prompting, offers an effective method for creating high-performing, computationally efficient models in specialized domains.
- Score: 2.327390371420762
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are typically trained on general source data for
various domains, but a recent surge in domain-specific LLMs has shown their
potential to outperform general-purpose models in domain-specific tasks (e.g.,
biomedicine). Although domain-specific pre-training enhances efficiency and
leads to smaller models, the computational costs of training these LLMs remain
high, posing budgeting challenges. We introduce MediSwift, a suite of
biomedical LMs that leverage sparse pre-training on domain-specific biomedical
text data. By inducing up to 75% weight sparsity during the pre-training phase,
MediSwift achieves a 2-2.5x reduction in training FLOPs. Notably, all sparse
pre-training was performed on the Cerebras CS-2 system, which is specifically
designed to realize the acceleration benefits from unstructured weight
sparsity, thereby significantly enhancing the efficiency of the MediSwift
models. Through subsequent dense fine-tuning and strategic soft prompting,
MediSwift models outperform existing LLMs up to 7B parameters on biomedical
tasks, setting new benchmarks w.r.t efficiency-accuracy on tasks such as
PubMedQA. Our results show that sparse pre-training, along with dense
fine-tuning and soft prompting, offers an effective method for creating
high-performing, computationally efficient models in specialized domains.
Related papers
- A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs [74.35290684163718]
A primary challenge in large language model (LLM) development is their onerous pre-training cost.
This paper explores a promising paradigm to improve LLM pre-training efficiency and quality by leveraging a small language model (SLM)
arXiv Detail & Related papers (2024-10-24T14:31:52Z) - Scaling Laws for Predicting Downstream Performance in LLMs [75.28559015477137]
This work focuses on the pre-training loss as a more-efficient metric for performance estimation.
We extend the power law analytical function to predict domain-specific pre-training loss based on FLOPs across data sources.
We employ a two-layer neural network to model the non-linear relationship between multiple domain-specific loss and downstream performance.
arXiv Detail & Related papers (2024-10-11T04:57:48Z) - The Impact of LoRA Adapters for LLMs on Clinical NLP Classification Under Data Limitations [4.72457683445805]
Fine-tuning Large Language Models (LLMs) for clinical Natural Language Processing (NLP) poses significant challenges due to the domain gap and limited data availability.
This study investigates the effectiveness of various adapter techniques, equivalent to Low-Rank Adaptation (LoRA)
We fine-tuned biomedical pre-trained models, including CamemBERT-bio, AliBERT, and DrBERT, alongside two Transformer-based models.
arXiv Detail & Related papers (2024-07-27T16:48:03Z) - Efficient Continual Pre-training by Mitigating the Stability Gap [68.49269649759005]
We study the behavior of Large Language Models (LLMs) during continual pre-training.
We propose three effective strategies to enhance LLM performance within a fixed compute budget.
Our strategies improve the average medical task performance of the OpenLlama-3B model from 36.2% to 40.7% with only 40% of the original training budget.
arXiv Detail & Related papers (2024-06-21T02:28:37Z) - Developing Healthcare Language Model Embedding Spaces [0.20971479389679337]
Pre-trained Large Language Models (LLMs) often struggle on out-of-domain datasets like healthcare focused text.
Three methods are assessed: traditional masked language modeling, Deep Contrastive Learning for Unsupervised Textual Representations (DeCLUTR) and a novel pre-training objective utilizing metadata categories from the healthcare settings.
Contrastively trained models outperform other approaches on the classification tasks, delivering strong performance from limited labeled data and with fewer model parameter updates required.
arXiv Detail & Related papers (2024-03-28T19:31:32Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - DB-LLM: Accurate Dual-Binarization for Efficient LLMs [83.70686728471547]
Large language models (LLMs) have significantly advanced the field of natural language processing.
Existing ultra-low-bit quantization always causes severe accuracy drops.
We propose a novel Dual-Binarization method for LLMs, namely DB-LLM.
arXiv Detail & Related papers (2024-02-19T09:04:30Z) - GrowLength: Accelerating LLMs Pretraining by Progressively Growing
Training Length [65.24730341801468]
This paper introduces a novel, simple, and effective method named growlength'' to accelerate the pretraining process of Large Language Models.
Our method progressively increases the training length throughout the pretraining phase, thereby mitigating computational costs and enhancing efficiency.
arXiv Detail & Related papers (2023-10-01T05:25:24Z) - Improving Small Language Models on PubMedQA via Generative Data
Augmentation [4.96649519549027]
Large Language Models (LLMs) have made remarkable advancements in the field of natural language processing.
Small Language Models (SLMs) are known for their efficiency, but they often struggle with limited capacity and training data.
We introduce a novel method aimed at improving SLMs in the medical domain using LLM-based generative data augmentation.
arXiv Detail & Related papers (2023-05-12T23:49:23Z) - Med-BERT: pre-trained contextualized embeddings on large-scale
structured electronic health records for disease prediction [12.669003066030697]
We propose Med-BERT, which adapts the BERT framework for pre-training contextualized embedding models on structured diagnosis data from 28,490,650 patients EHR dataset.
Med-BERT substantially improves prediction accuracy, boosting the area under receiver operating characteristics curve (AUC) by 2.02-7.12%.
In particular, pre-trained Med-BERT substantially improves the performance of tasks with very small fine-tuning training sets (300-500 samples) boosting the AUC by more than 20% or equivalent to the AUC of 10 times larger training set.
arXiv Detail & Related papers (2020-05-22T05:07:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.