Iterative Prompt Refinement for Radiation Oncology Symptom Extraction
Using Teacher-Student Large Language Models
- URL: http://arxiv.org/abs/2402.04075v1
- Date: Tue, 6 Feb 2024 15:25:09 GMT
- Title: Iterative Prompt Refinement for Radiation Oncology Symptom Extraction
Using Teacher-Student Large Language Models
- Authors: Reza Khanmohammadi, Ahmed I Ghanem, Kyle Verdecchia, Ryan Hall,
Mohamed Elshaikh, Benjamin Movsas, Hassan Bagher-Ebadian, Indrin Chetty,
Mohammad M. Ghassemi, Kundan Thind
- Abstract summary: Mixtral, the student model, initially extracts symptoms, followed by GPT-4, the teacher model, which refines prompts based on Mixtral's performance.
Results showed significant improvements in extracting symptoms from both single and multi-symptom notes.
- Score: 1.3137489010086167
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study introduces a novel teacher-student architecture utilizing Large
Language Models (LLMs) to improve prostate cancer radiotherapy symptom
extraction from clinical notes. Mixtral, the student model, initially extracts
symptoms, followed by GPT-4, the teacher model, which refines prompts based on
Mixtral's performance. This iterative process involved 294 single symptom
clinical notes across 12 symptoms, with up to 16 rounds of refinement per
epoch. Results showed significant improvements in extracting symptoms from both
single and multi-symptom notes. For 59 single symptom notes, accuracy increased
from 0.51 to 0.71, precision from 0.52 to 0.82, recall from 0.52 to 0.72, and
F1 score from 0.49 to 0.73. In 375 multi-symptom notes, accuracy rose from 0.24
to 0.43, precision from 0.6 to 0.76, recall from 0.24 to 0.43, and F1 score
from 0.20 to 0.44. These results demonstrate the effectiveness of advanced
prompt engineering in LLMs for radiation oncology use.
Related papers
- LLM Assistance for Pediatric Depression [2.1398676192061683]
This work investigates the feasibility of state-of-the-art LLMs for depressive symptom extraction in pediatric settings (ages 6-24).
Our findings show that all LLMs are 60% more efficient than word match, with Flan leading in precision (average F1: 0.65, precision: 0.78), excelling in the extraction of more rare symptoms like "sleep problems" (F1: 0.92) and "self-loathing" (F1: 0.8)
Llama 3, with the highest recall (0.90), overgeneralizes symptoms, making it less suitable for this type of analysis.
arXiv Detail & Related papers (2025-01-29T09:27:27Z) - Distilling Large Language Models for Efficient Clinical Information Extraction [2.953317125529822]
We evaluate the performance of distilled BERT models, which are approximately 1,000 times smaller than modern LLMs.
We leveraged state-of-the-art LLMs (Gemini and OpenAI models) and medical (RxNorm and SNOMED) as teacher labelers for medication, disease, and symptom extraction.
We applied our approach to over 3,300 clinical notes spanning five publicly available datasets.
arXiv Detail & Related papers (2024-12-21T02:15:29Z) - Diagnostic Performance of Deep Learning for Predicting Gliomas' IDH and 1p/19q Status in MRI: A Systematic Review and Meta-Analysis [0.0]
Gliomas are the most common primary brain tumors.
molecular profiling is critical for diagnosis, treatment, and prognosis.
This review evaluates MRI-based deep learning (DL) models' efficacy in predicting these biomarkers.
arXiv Detail & Related papers (2024-10-28T13:39:52Z) - Hybrid Student-Teacher Large Language Model Refinement for Cancer Toxicity Symptom Extraction [3.564938069395287]
Large Language Models (LLMs) offer significant potential for clinical symptom extraction, but their deployment in healthcare settings is constrained by privacy concerns, computational limitations, and operational costs.
This study investigates the optimization of compact LLMs for cancer toxicity symptom extraction using a novel iterative refinement approach.
arXiv Detail & Related papers (2024-08-08T22:18:01Z) - AXIAL: Attention-based eXplainability for Interpretable Alzheimer's Localized Diagnosis using 2D CNNs on 3D MRI brain scans [43.06293430764841]
This study presents an innovative method for Alzheimer's disease diagnosis using 3D MRI designed to enhance the explainability of model decisions.
Our approach adopts a soft attention mechanism, enabling 2D CNNs to extract volumetric representations.
With voxel-level precision, our method identified which specific areas are being paid attention to, identifying these predominant brain regions.
arXiv Detail & Related papers (2024-07-02T16:44:00Z) - Improving Large Language Models for Clinical Named Entity Recognition
via Prompt Engineering [20.534197056683695]
This study quantifies the capabilities of GPT-3.5 and GPT-4 for clinical named entity recognition (NER) tasks.
We developed a task-specific prompt framework that includes baseline prompts, annotation guideline-based prompts, error analysis-based instructions, and annotated samples.
We assessed each prompt's effectiveness and compared the models to BioClinicalBERT.
arXiv Detail & Related papers (2023-03-29T02:46:18Z) - Attention-based Saliency Maps Improve Interpretability of Pneumothorax
Classification [52.77024349608834]
To investigate chest radiograph (CXR) classification performance of vision transformers (ViT) and interpretability of attention-based saliency.
ViTs were fine-tuned for lung disease classification using four public data sets: CheXpert, Chest X-Ray 14, MIMIC CXR, and VinBigData.
ViTs had comparable CXR classification AUCs compared with state-of-the-art CNNs.
arXiv Detail & Related papers (2023-03-03T12:05:41Z) - Learning to diagnose cirrhosis from radiological and histological labels
with joint self and weakly-supervised pretraining strategies [62.840338941861134]
We propose to leverage transfer learning from large datasets annotated by radiologists, to predict the histological score available on a small annex dataset.
We compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis.
This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75.
arXiv Detail & Related papers (2023-02-16T17:06:23Z) - DeepCOVID-Fuse: A Multi-modality Deep Learning Model Fusing Chest
X-Radiographs and Clinical Variables to Predict COVID-19 Risk Levels [8.593516170110203]
DeepCOVID-Fuse is a deep learning fusion model to predict risk levels in coronavirus patients.
The accuracy of DeepCOVID-Fuse trained on CXRs and clinical variables is 0.658, with an AUC of 0.842.
arXiv Detail & Related papers (2023-01-20T20:54:25Z) - Comparison of Machine Learning Classifiers to Predict Patient Survival
and Genetics of GBM: Towards a Standardized Model for Clinical Implementation [44.02622933605018]
Radiomic models have been shown to outperform clinical data for outcome prediction in glioblastoma (GBM)
We aimed to compare nine machine learning classifiers to predict overall survival (OS), isocitrate dehydrogenase (IDH) mutation, O-6-methylguanine-DNA-methyltransferase (MGMT) promoter methylation, epidermal growth factor receptor (EGFR) VII amplification and Ki-67 expression in GBM patients.
xGB obtained maximum accuracy for OS (74.5%), AB for IDH mutation (88%), MGMT methylation (71,7%), Ki-67 expression (86,6%), and EGFR amplification (81,
arXiv Detail & Related papers (2021-02-10T15:10:37Z) - Attention-Based LSTM Network for COVID-19 Clinical Trial Parsing [0.0]
We train attention-based bidirectional Long Short-Term Memory (Att-BiLSTM) models and use the optimal model to extract entities from the eligibility criteria of COVID-19 trials.
We compare the performance of Att-BiLSTM with traditional ontology-based method.
Our analyses demonstrate that Att-BiLSTM is an effective approach for characterizing patient populations in COVID-19 clinical trials.
arXiv Detail & Related papers (2020-12-18T05:55:52Z) - Multilabel 12-Lead Electrocardiogram Classification Using Gradient
Boosting Tree Ensemble [64.29529357862955]
We build an algorithm using gradient boosted tree ensembles fitted on morphology and signal processing features to classify ECG diagnosis.
For each lead, we derive features from heart rate variability, PQRST template shape, and the full signal waveform.
We join the features of all 12 leads to fit an ensemble of gradient boosting decision trees to predict probabilities of ECG instances belonging to each class.
arXiv Detail & Related papers (2020-10-21T18:11:36Z) - Joint Prediction and Time Estimation of COVID-19 Developing Severe
Symptoms using Chest CT Scan [49.209225484926634]
We propose a joint classification and regression method to determine whether the patient would develop severe symptoms in the later time.
To do this, the proposed method takes into account 1) the weight for each sample to reduce the outliers' influence and explore the problem of imbalance classification.
Our proposed method yields 76.97% of accuracy for predicting the severe cases, 0.524 of the correlation coefficient, and 0.55 days difference for the converted time.
arXiv Detail & Related papers (2020-05-07T12:16:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.