A comparative study of zero-shot inference with large language models
and supervised modeling in breast cancer pathology classification
- URL: http://arxiv.org/abs/2401.13887v1
- Date: Thu, 25 Jan 2024 02:05:31 GMT
- Title: A comparative study of zero-shot inference with large language models
and supervised modeling in breast cancer pathology classification
- Authors: Madhumita Sushil, Travis Zack, Divneet Mandair, Zhiwei Zheng, Ahmed
Wali, Yan-Ning Yu, Yuwei Quan, Atul J. Butte
- Abstract summary: Large language models (LLMs) have demonstrated promising transfer learning capability.
LLMs demonstrated the potential to speed up the execution of clinical NLP studies by reducing the need for curating large annotated datasets.
This may result in an increase in the utilization of NLP-based variables and outcomes in observational clinical studies.
- Score: 1.4715634464004446
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Although supervised machine learning is popular for information extraction
from clinical notes, creating large annotated datasets requires extensive
domain expertise and is time-consuming. Meanwhile, large language models (LLMs)
have demonstrated promising transfer learning capability. In this study, we
explored whether recent LLMs can reduce the need for large-scale data
annotations. We curated a manually-labeled dataset of 769 breast cancer
pathology reports, labeled with 13 categories, to compare zero-shot
classification capability of the GPT-4 model and the GPT-3.5 model with
supervised classification performance of three model architectures: random
forests classifier, long short-term memory networks with attention (LSTM-Att),
and the UCSF-BERT model. Across all 13 tasks, the GPT-4 model performed either
significantly better than or as well as the best supervised model, the LSTM-Att
model (average macro F1 score of 0.83 vs. 0.75). On tasks with high imbalance
between labels, the differences were more prominent. Frequent sources of GPT-4
errors included inferences from multiple samples and complex task design. On
complex tasks where large annotated datasets cannot be easily collected, LLMs
can reduce the burden of large-scale data labeling. However, if the use of LLMs
is prohibitive, the use of simpler supervised models with large annotated
datasets can provide comparable results. LLMs demonstrated the potential to
speed up the execution of clinical NLP studies by reducing the need for
curating large annotated datasets. This may result in an increase in the
utilization of NLP-based variables and outcomes in observational clinical
studies.
Related papers
- A Comparative Study of Recent Large Language Models on Generating Hospital Discharge Summaries for Lung Cancer Patients [19.777109737517996]
This research aims to explore how large language models (LLMs) can alleviate the burden of manual summarization.
This study evaluates the performance of multiple LLMs, including GPT-3.5, GPT-4, GPT-4o, and LLaMA 3 8b, in generating discharge summaries.
arXiv Detail & Related papers (2024-11-06T10:02:50Z) - Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - The Impact of LoRA Adapters for LLMs on Clinical NLP Classification Under Data Limitations [4.72457683445805]
Fine-tuning Large Language Models (LLMs) for clinical Natural Language Processing (NLP) poses significant challenges due to the domain gap and limited data availability.
This study investigates the effectiveness of various adapter techniques, equivalent to Low-Rank Adaptation (LoRA)
We fine-tuned biomedical pre-trained models, including CamemBERT-bio, AliBERT, and DrBERT, alongside two Transformer-based models.
arXiv Detail & Related papers (2024-07-27T16:48:03Z) - Is larger always better? Evaluating and prompting large language models for non-generative medical tasks [11.799956298563844]
This study benchmarks various models, including GPT-based LLMs, BERT-based models, and traditional clinical predictive models.
We focused on tasks such as readmission and prediction, disease hierarchy reconstruction, and biomedical sentence matching.
Results indicated that LLMs exhibited robust zero-shot predictive capabilities on structured EHR data when using well-designed prompting strategies.
For unstructured medical texts, LLMs did not outperform finetuned BERT models, which excelled in both supervised and unsupervised tasks.
arXiv Detail & Related papers (2024-07-26T06:09:10Z) - LLM-Select: Feature Selection with Large Language Models [64.5099482021597]
Large language models (LLMs) are capable of selecting the most predictive features, with performance rivaling the standard tools of data science.
Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place.
arXiv Detail & Related papers (2024-07-02T22:23:40Z) - Evaluating Large Language Models for Health-Related Text Classification Tasks with Public Social Media Data [3.9459077974367833]
Large language models (LLMs) have demonstrated remarkable success in NLP tasks.
We benchmarked one supervised classic machine learning model based on Support Vector Machines (SVMs), three supervised pretrained language models (PLMs) based on RoBERTa, BERTweet, and SocBERT, and two LLM based classifiers (GPT3.5 and GPT4), across 6 text classification tasks.
Our comprehensive experiments demonstrate that employ-ing data augmentation using LLMs (GPT-4) with relatively small human-annotated data to train lightweight supervised classification models achieves superior results compared to training with human-annotated data
arXiv Detail & Related papers (2024-03-27T22:05:10Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator [63.762209407570715]
Genixer is a comprehensive data generation pipeline consisting of four key steps.
A synthetic VQA-like dataset trained with LLaVA1.5 enhances performance on 10 out of 12 multimodal benchmarks.
MLLMs trained with task-specific datasets can surpass GPT-4V in generating complex instruction tuning data.
arXiv Detail & Related papers (2023-12-11T09:44:41Z) - Scaling Relationship on Learning Mathematical Reasoning with Large
Language Models [75.29595679428105]
We investigate how the pre-training loss, supervised data amount, and augmented data amount influence the reasoning performances of a supervised LLM.
We find that rejection samples from multiple models push LLaMA-7B to an accuracy of 49.3% on GSM8K which outperforms the supervised fine-tuning (SFT) accuracy of 35.9% significantly.
arXiv Detail & Related papers (2023-08-03T15:34:01Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - CancerGPT: Few-shot Drug Pair Synergy Prediction using Large Pre-trained
Language Models [3.682742580232362]
Large pre-trained language models (LLMs) have been shown to have significant potential in few-shot learning across various fields.
Our research is the first to tackle drug pair synergy prediction in rare tissues with limited data.
arXiv Detail & Related papers (2023-04-18T02:49:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.