CORAL: Expert-Curated medical Oncology Reports to Advance Language Model
Inference
- URL: http://arxiv.org/abs/2308.03853v3
- Date: Thu, 11 Jan 2024 16:45:56 GMT
- Title: CORAL: Expert-Curated medical Oncology Reports to Advance Language Model
Inference
- Authors: Madhumita Sushil, Vanessa E. Kennedy, Divneet Mandair, Brenda Y. Miao,
Travis Zack, Atul J. Butte
- Abstract summary: Large language models (LLMs) have recently exhibited impressive performance on various medical natural language processing tasks.
We developed a detailed schema for annotating textual oncology information, encompassing patient characteristics, tumor characteristics, tests, treatments, and temporality.
The GPT-4 model exhibited overall best performance, with an average BLEU score of 0.73, an average ROUGE score of 0.72, an exact-match F1-score of 0.51, and an average accuracy of 68% on complex tasks.
- Score: 2.1067045507411195
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Both medical care and observational studies in oncology require a thorough
understanding of a patient's disease progression and treatment history, often
elaborately documented in clinical notes. Despite their vital role, no current
oncology information representation and annotation schema fully encapsulates
the diversity of information recorded within these notes. Although large
language models (LLMs) have recently exhibited impressive performance on
various medical natural language processing tasks, due to the current lack of
comprehensively annotated oncology datasets, an extensive evaluation of LLMs in
extracting and reasoning with the complex rhetoric in oncology notes remains
understudied. We developed a detailed schema for annotating textual oncology
information, encompassing patient characteristics, tumor characteristics,
tests, treatments, and temporality. Using a corpus of 40 de-identified breast
and pancreatic cancer progress notes at University of California, San
Francisco, we applied this schema to assess the zero-shot abilities of three
recent LLMs (GPT-4, GPT-3.5-turbo, and FLAN-UL2) to extract detailed
oncological history from two narrative sections of clinical progress notes. Our
team annotated 9028 entities, 9986 modifiers, and 5312 relationships. The GPT-4
model exhibited overall best performance, with an average BLEU score of 0.73,
an average ROUGE score of 0.72, an exact-match F1-score of 0.51, and an average
accuracy of 68% on complex tasks (expert manual evaluation on subset). Notably,
it was proficient in tumor characteristic and medication extraction, and
demonstrated superior performance in relational inference like adverse event
detection. However, further improvements are needed before using it to reliably
extract important facts from cancer progress notes needed for clinical
research, complex population management, and documenting quality patient care.
Related papers
- A Comparative Study of Recent Large Language Models on Generating Hospital Discharge Summaries for Lung Cancer Patients [19.777109737517996]
This research aims to explore how large language models (LLMs) can alleviate the burden of manual summarization.
This study evaluates the performance of multiple LLMs, including GPT-3.5, GPT-4, GPT-4o, and LLaMA 3 8b, in generating discharge summaries.
arXiv Detail & Related papers (2024-11-06T10:02:50Z) - SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research [45.2233252981348]
Large Language Models have shown promising results in their ability to encode general medical knowledge.
We test the ability of state-of-the-art LLMs to leverage their internal knowledge and reasoning for epilepsy diagnosis.
arXiv Detail & Related papers (2024-07-03T11:02:12Z) - Improving Entity Recognition Using Ensembles of Deep Learning and Fine-tuned Large Language Models: A Case Study on Adverse Event Extraction from Multiple Sources [13.750202656564907]
Adverse event (AE) extraction is crucial for monitoring and analyzing the safety profiles of immunizations.
This study aims to evaluate the effectiveness of large language models (LLMs) and traditional deep learning models in AE extraction.
arXiv Detail & Related papers (2024-06-26T03:56:21Z) - Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports [68.39938936308023]
We propose a novel text-guided learning method to achieve highly accurate cancer detection results.
Our approach can leverage clinical knowledge by large-scale pre-trained VLM to enhance generalization ability.
arXiv Detail & Related papers (2024-05-23T07:03:38Z) - Histopathologic Cancer Detection [0.0]
This work uses the PatchCamelyon benchmark datasets and trains them in a multi-layer perceptron and convolution model to observe the model's performance in terms of precision Recall, F1 Score, Accuracy, and AUC Score.
Also, this paper introduced ResNet50 and InceptionNet models with data augmentation, where ResNet50 is able to beat the state-of-the-art model.
arXiv Detail & Related papers (2023-11-13T19:51:46Z) - Zero-shot Learning with Minimum Instruction to Extract Social
Determinants and Family History from Clinical Notes using GPT Model [4.72294159722118]
This research focuses on investigating the zero-shot learning on extracting this information together.
We utilize de-identified real-world clinical notes annotated for demographics, various social determinants, and family history information.
Our results show that the GPT-3.5 method achieved an average of 0.975 F1 on demographics extraction, 0.615 F1 on social determinants extraction, and 0.722 F1 on family history extraction.
arXiv Detail & Related papers (2023-09-11T14:16:27Z) - PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding.
LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge.
We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z) - Foresight -- Deep Generative Modelling of Patient Timelines using
Electronic Health Records [46.024501445093755]
Temporal modelling of medical history can be used to forecast and simulate future events, estimate risk, suggest alternative diagnoses or forecast complications.
We present Foresight, a novel GPT3-based pipeline that uses NER+L tools (i.e. MedCAT) to convert document text into structured, coded concepts.
arXiv Detail & Related papers (2022-12-13T19:06:00Z) - Federated Learning Enables Big Data for Rare Cancer Boundary Detection [98.5549882883963]
We present findings from the largest Federated ML study to-date, involving data from 71 healthcare institutions across 6 continents.
We generate an automatic tumor boundary detector for the rare disease of glioblastoma.
We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent.
arXiv Detail & Related papers (2022-04-22T17:27:00Z) - Human Evaluation and Correlation with Automatic Metrics in Consultation
Note Generation [56.25869366777579]
In recent years, machine learning models have rapidly become better at generating clinical consultation notes.
We present an extensive human evaluation study where 5 clinicians listen to 57 mock consultations, write their own notes, post-edit a number of automatically generated notes, and extract all the errors.
We find that a simple, character-based Levenshtein distance metric performs on par if not better than common model-based metrics like BertScore.
arXiv Detail & Related papers (2022-04-01T14:04:16Z) - A Systematic Review of Natural Language Processing Applied to Radiology
Reports [3.600747505433814]
This study systematically assesses recent literature in NLP applied to radiology reports.
Our analysis is based on 21 variables including radiology characteristics, NLP methodology, performance, study, and clinical application characteristics.
arXiv Detail & Related papers (2021-02-18T18:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.