The Impact of LoRA Adapters for LLMs on Clinical NLP Classification Under Data Limitations
- URL: http://arxiv.org/abs/2407.19299v1
- Date: Sat, 27 Jul 2024 16:48:03 GMT
- Title: The Impact of LoRA Adapters for LLMs on Clinical NLP Classification Under Data Limitations
- Authors: Thanh-Dung Le, Ti Ti Nguyen, Vu Nguyen Ha,
- Abstract summary: Fine-tuning Large Language Models (LLMs) for clinical Natural Language Processing (NLP) poses significant challenges due to the domain gap and limited data availability.
This study investigates the effectiveness of various adapter techniques, equivalent to Low-Rank Adaptation (LoRA)
We fine-tuned biomedical pre-trained models, including CamemBERT-bio, AliBERT, and DrBERT, alongside two Transformer-based models.
- Score: 4.72457683445805
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Fine-tuning Large Language Models (LLMs) for clinical Natural Language Processing (NLP) poses significant challenges due to the domain gap and limited data availability. This study investigates the effectiveness of various adapter techniques, equivalent to Low-Rank Adaptation (LoRA), for fine-tuning LLMs in a resource-constrained hospital environment. We experimented with four structures-Adapter, Lightweight, TinyAttention, and Gated Residual Network (GRN)-as final layers for clinical notes classification. We fine-tuned biomedical pre-trained models, including CamemBERT-bio, AliBERT, and DrBERT, alongside two Transformer-based models. Our extensive experimental results indicate that i) employing adapter structures does not yield significant improvements in fine-tuning biomedical pre-trained LLMs, and ii) simpler Transformer-based models, trained from scratch, perform better under resource constraints. Among the adapter structures, GRN demonstrated superior performance with accuracy, precision, recall, and an F1 score of 0.88. Moreover, the total training time for LLMs exceeded 1000 hours, compared to under 6 hours for simpler transformer-based models, highlighting that LLMs are more suitable for environments with extensive computational resources and larger datasets. Consequently, this study demonstrates that simpler Transformer-based models can be effectively trained from scratch, providing a viable solution for clinical NLP tasks in low-resource environments with limited data availability. By identifying the GRN as the most effective adapter structure, we offer a practical approach to enhance clinical note classification without requiring extensive computational resources.
Related papers
- Using Large Language Models for Expert Prior Elicitation in Predictive Modelling [53.54623137152208]
This study proposes using large language models (LLMs) to elicit expert prior distributions for predictive models.
We compare LLM-elicited and uninformative priors, evaluate whether LLMs truthfully generate parameter distributions, and propose a model selection strategy for in-context learning and prior elicitation.
Our findings show that LLM-elicited prior parameter distributions significantly reduce predictive error compared to uninformative priors in low-data settings.
arXiv Detail & Related papers (2024-11-26T10:13:39Z) - Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks.
Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum.
We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z) - STAL: Spike Threshold Adaptive Learning Encoder for Classification of Pain-Related Biosignal Data [2.0738462952016232]
This paper presents the first application of spiking neural networks (SNNs) for the classification of chronic lower back pain (CLBP) using the EmoPain dataset.
We introduce Spike Threshold Adaptive Learning (STAL), a trainable encoder that effectively converts continuous biosignals into spike trains.
We also propose an ensemble of Spiking Recurrent Neural Network (SRNN) classifiers for the multi-stream processing of sEMG and IMU data.
arXiv Detail & Related papers (2024-07-11T10:15:52Z) - Geneverse: A collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research [20.285114234576298]
Large language models (LLMs) are promising for biomedical and healthcare research.
We propose a collection of finetuned LLMs and multimodal LLMs (MLLMs) for three novel tasks in genomics and proteomic research.
The models in Geneverse are trained and evaluated based on domain-specific datasets.
We demonstrate that adapted LLMs and MLLMs perform well for these tasks and may outperform closed-source large-scale models.
arXiv Detail & Related papers (2024-06-21T14:19:10Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)
Our research aims to transform existing medication recommendation methodologies using LLMs.
To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - A comparative study of zero-shot inference with large language models
and supervised modeling in breast cancer pathology classification [1.4715634464004446]
Large language models (LLMs) have demonstrated promising transfer learning capability.
LLMs demonstrated the potential to speed up the execution of clinical NLP studies by reducing the need for curating large annotated datasets.
This may result in an increase in the utilization of NLP-based variables and outcomes in observational clinical studies.
arXiv Detail & Related papers (2024-01-25T02:05:31Z) - Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes [57.62036621319563]
We introduce CLLM, which leverages the prior knowledge of Large Language Models (LLMs) for data augmentation in the low-data regime.
We demonstrate the superior performance of CLLM in the low-data regime compared to conventional generators.
arXiv Detail & Related papers (2023-12-19T12:34:46Z) - Unleashing the Power of Pre-trained Language Models for Offline
Reinforcement Learning [54.682106515794864]
offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected datasets.
This paper introduces $textbfLanguage Models for $textbfMo$tion Control ($textbfLaMo$), a general framework based on Decision Transformers to use pre-trained Language Models (LMs) for offline RL.
Empirical results indicate $textbfLaMo$ achieves state-of-the-art performance in sparse-reward tasks.
arXiv Detail & Related papers (2023-10-31T16:24:17Z) - Time to Embrace Natural Language Processing (NLP)-based Digital
Pathology: Benchmarking NLP- and Convolutional Neural Network-based Deep
Learning Pipelines [4.876281217951695]
NLP-based computer vision models, particularly vision transformers, have been shown to outperform CNN models in many imaging tasks.
We developed digital pathology pipelines to benchmark the five most recently proposed NLP models and four popular CNN models.
Our NLP models achieved state-of-the-art predictions for all three biomarkers using a relatively small training dataset.
arXiv Detail & Related papers (2023-02-21T02:42:03Z) - Exploring the Value of Pre-trained Language Models for Clinical Named
Entity Recognition [6.917786124918387]
We compare Transformer models that are trained from scratch to fine-tuned BERT-based LLMs.
We examine the impact of an additional CRF layer on such models to encourage contextual learning.
arXiv Detail & Related papers (2022-10-23T16:27:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.