CancerLLM: A Large Language Model in Cancer Domain
- URL: http://arxiv.org/abs/2406.10459v1
- Date: Sat, 15 Jun 2024 01:02:48 GMT
- Title: CancerLLM: A Large Language Model in Cancer Domain
- Authors: Mingchen Li, Anne Blaes, Steven Johnson, Hongfang Liu, Hua Xu, Rui Zhang,
- Abstract summary: CancerLLM is a model with 7 billion parameters and a Mistral-style architecture, pre-trained on 2,676,642 clinical notes and 515,524 pathology reports covering 17 cancer types.
Our evaluation demonstrated that CancerLLM achieves state-of-the-art results compared to other existing LLMs, with an average F1 score improvement of 8.1%.
- Score: 19.384643526294127
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Medical Large Language Models (LLMs) such as ClinicalCamel 70B, Llama3-OpenBioLLM 70B have demonstrated impressive performance on a wide variety of medical NLP task.However, there still lacks a large language model (LLM) specifically designed for cancer domain. Moreover, these LLMs typically have billions of parameters, making them computationally expensive for healthcare systems.Thus, in this study, we propose CancerLLM, a model with 7 billion parameters and a Mistral-style architecture, pre-trained on 2,676,642 clinical notes and 515,524 pathology reports covering 17 cancer types, followed by fine-tuning on three cancer-relevant tasks, including cancer phenotypes extraction, cancer diagnosis generation, and cancer treatment plan generation. Our evaluation demonstrated that CancerLLM achieves state-of-the-art results compared to other existing LLMs, with an average F1 score improvement of 8.1\%. Additionally, CancerLLM outperforms other models on two proposed robustness testbeds. This illustrates that CancerLLM can be effectively applied to clinical AI systems, enhancing clinical research and healthcare delivery in the field of cancer.
Related papers
- A Large Language Model Pipeline for Breast Cancer Oncology [0.0]
State-of-the-art OpenAI models were fine-tuned on a clinical dataset and clinical guidelines text corpus for two important cancer treatment factors.
A high accuracy (0.85+) was achieved in the classification of adjuvant radiation therapy and chemotherapy for breast cancer patients.
arXiv Detail & Related papers (2024-06-10T16:44:48Z) - Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports [68.39938936308023]
We propose a novel text-guided learning method to achieve highly accurate cancer detection results.
Our approach can leverage clinical knowledge by large-scale pre-trained VLM to enhance generalization ability.
arXiv Detail & Related papers (2024-05-23T07:03:38Z) - Improving Breast Cancer Grade Prediction with Multiparametric MRI Created Using Optimized Synthetic Correlated Diffusion Imaging [71.91773485443125]
Grading plays a vital role in breast cancer treatment planning.
The current tumor grading method involves extracting tissue from patients, leading to stress, discomfort, and high medical costs.
This paper examines using optimized CDI$s$ to improve breast cancer grade prediction.
arXiv Detail & Related papers (2024-05-13T15:48:26Z) - Narrative Feature or Structured Feature? A Study of Large Language Models to Identify Cancer Patients at Risk of Heart Failure [21.660602700862714]
This study examined machine learning models to identify cancer patients at risk of heart failure.
We identified a cancer cohort of 12,806 patients from the University of Florida Health, diagnosed with lung, breast, and colorectal cancers.
The proposed narrative features remarkably increased feature density and improved performance.
arXiv Detail & Related papers (2024-03-18T02:42:01Z) - Cancer-Net BCa-S: Breast Cancer Grade Prediction using Volumetric Deep
Radiomic Features from Synthetic Correlated Diffusion Imaging [82.74877848011798]
The prevalence of breast cancer continues to grow, affecting about 300,000 females in the United States in 2023.
The gold-standard Scarff-Bloom-Richardson (SBR) grade has been shown to consistently indicate a patient's response to chemotherapy.
In this paper, we study the efficacy of deep learning for breast cancer grading based on synthetic correlated diffusion (CDI$s$) imaging.
arXiv Detail & Related papers (2023-04-12T15:08:34Z) - A Multi-Institutional Open-Source Benchmark Dataset for Breast Cancer
Clinical Decision Support using Synthetic Correlated Diffusion Imaging Data [82.74877848011798]
Cancer-Net BCa is a multi-institutional open-source benchmark dataset of volumetric CDI$s$ imaging data of breast cancer patients.
Cancer-Net BCa is publicly available as a part of a global open-source initiative dedicated to accelerating advancement in machine learning to aid clinicians in the fight against cancer.
arXiv Detail & Related papers (2023-04-12T05:41:44Z) - CancerUniT: Towards a Single Unified Model for Effective Detection,
Segmentation, and Diagnosis of Eight Major Cancers Using a Large Collection
of CT Scans [45.83431075462771]
Human readers or radiologists routinely perform full-body multi-organ multi-disease detection and diagnosis in clinical practice.
Most medical AI systems are built to focus on single organs with a narrow list of a few diseases.
CancerUniT is a query-based Mask Transformer model with the output of multi-tumor prediction.
arXiv Detail & Related papers (2023-01-28T20:09:34Z) - Improving Precancerous Case Characterization via Transformer-based
Ensemble Learning [31.891340667123124]
The application of natural language processing to cancer pathology reports has been focused on detecting cancer cases.
Improving the characterization of precancerous adenomas assists in developing diagnostic tests for early cancer detection and prevention.
Our results demonstrated the potential of using NLP to leverage real-world health record data to facilitate the development of diagnostic tests for early cancer prevention.
arXiv Detail & Related papers (2022-12-10T00:06:28Z) - Enhancing Clinical Support for Breast Cancer with Deep Learning Models
using Synthetic Correlated Diffusion Imaging [66.63200823918429]
We investigate enhancing clinical support for breast cancer with deep learning models.
We leverage a volumetric convolutional neural network to learn deep radiomic features from a pre-treatment cohort.
We find that the proposed approach can achieve better performance for both grade and post-treatment response prediction.
arXiv Detail & Related papers (2022-11-10T03:02:12Z) - CorrSigNet: Learning CORRelated Prostate Cancer SIGnatures from
Radiology and Pathology Images for Improved Computer Aided Diagnosis [1.63324350193061]
We propose CorrSigNet, an automated two-step model that localizes prostate cancer on MRI.
First, the model learns MRI signatures of cancer that are correlated with corresponding histopathology features.
Second, the model uses the learned correlated MRI features to train a Convolutional Neural Network to localize prostate cancer.
arXiv Detail & Related papers (2020-07-31T23:44:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.