CancerLLM: A Large Language Model in Cancer Domain
- URL: http://arxiv.org/abs/2406.10459v3
- Date: Tue, 01 Apr 2025 02:23:57 GMT
- Title: CancerLLM: A Large Language Model in Cancer Domain
- Authors: Mingchen Li, Jiatan Huang, Jeremy Yeung, Anne Blaes, Steven Johnson, Hongfang Liu, Hua Xu, Rui Zhang,
- Abstract summary: CancerLLM is a model with 7 billion parameters and a Mistral-style architecture, pre-trained on nearly 2.7M clinical notes and over 515K pathology reports covering 17 cancer types.<n>It achieves state-of-the-art results with F1 score of 91.78% on phenotyping extraction and 86.81% on disganois generation.
- Score: 17.696798724373934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Medical Large Language Models (LLMs) have demonstrated impressive performance on a wide variety of medical NLP tasks; however, there still lacks a LLM specifically designed for phenotyping identification and diagnosis in cancer domain. Moreover, these LLMs typically have several billions of parameters, making them computationally expensive for healthcare systems. Thus, in this study, we propose CancerLLM, a model with 7 billion parameters and a Mistral-style architecture, pre-trained on nearly 2.7M clinical notes and over 515K pathology reports covering 17 cancer types, followed by fine-tuning on two cancer-relevant tasks, including cancer phenotypes extraction and cancer diagnosis generation. Our evaluation demonstrated that the CancerLLM achieves state-of-the-art results with F1 score of 91.78% on phenotyping extraction and 86.81% on disganois generation. It outperformed existing LLMs, with an average F1 score improvement of 9.23%. Additionally, the CancerLLM demonstrated its efficiency on time and GPU usage, and robustness comparing with other LLMs. We demonstrated that CancerLLM can potentially provide an effective and robust solution to advance clinical research and practice in cancer domain
Related papers
- ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification [57.22053411719822]
ChestX-Reasoner is a radiology diagnosis MLLM designed to leverage process supervision mined directly from clinical reports.
Our two-stage training framework combines supervised fine-tuning and reinforcement learning guided by process rewards to better align model reasoning with clinical standards.
arXiv Detail & Related papers (2025-04-29T16:48:23Z) - Cancer-Net PCa-Seg: Benchmarking Deep Learning Models for Prostate Cancer Segmentation Using Synthetic Correlated Diffusion Imaging [65.83291923029985]
Prostate cancer (PCa) is the most prevalent cancer among men in the United States, accounting for nearly 300,000 cases, 29% of all diagnoses and 35,000 total deaths in 2024.
Traditional screening methods such as prostate-specific antigen (PSA) testing and magnetic resonance imaging (MRI) have been pivotal in diagnosis, but have faced limitations in specificity and generalizability.
We employ several state-of-the-art deep learning models, including U-Net, SegResNet, Swin UNETR, Attention U-Net, and LightM-UNet, to segment PCa lesions from a 200 CDI$
arXiv Detail & Related papers (2025-01-15T22:23:41Z) - Towards Non-invasive and Personalized Management of Breast Cancer Patients from Multiparametric MRI via A Large Mixture-of-Modality-Experts Model [19.252851972152957]
We report a mixture-of-modality-experts model (MOME) that integrates multiparametric MRI information within a unified structure.
MOME demonstrated accurate and robust identification of breast cancer.
It could reduce the need for biopsies in BI-RADS 4 patients with a ratio of 7.3%, classify triple-negative breast cancer with an AUROC of 0.709, and predict pathological complete response to neoadjuvant chemotherapy with an AUROC of 0.694.
arXiv Detail & Related papers (2024-08-08T05:04:13Z) - A Large Language Model Pipeline for Breast Cancer Oncology [0.0]
State-of-the-art OpenAI models were fine-tuned on a clinical dataset and clinical guidelines text corpus for two important cancer treatment factors.
A high accuracy (0.85+) was achieved in the classification of adjuvant radiation therapy and chemotherapy for breast cancer patients.
arXiv Detail & Related papers (2024-06-10T16:44:48Z) - Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports [68.39938936308023]
We propose a novel text-guided learning method to achieve highly accurate cancer detection results.
Our approach can leverage clinical knowledge by large-scale pre-trained VLM to enhance generalization ability.
arXiv Detail & Related papers (2024-05-23T07:03:38Z) - Improving Breast Cancer Grade Prediction with Multiparametric MRI Created Using Optimized Synthetic Correlated Diffusion Imaging [71.91773485443125]
Grading plays a vital role in breast cancer treatment planning.
The current tumor grading method involves extracting tissue from patients, leading to stress, discomfort, and high medical costs.
This paper examines using optimized CDI$s$ to improve breast cancer grade prediction.
arXiv Detail & Related papers (2024-05-13T15:48:26Z) - Narrative Feature or Structured Feature? A Study of Large Language Models to Identify Cancer Patients at Risk of Heart Failure [21.660602700862714]
This study examined machine learning models to identify cancer patients at risk of heart failure.
We identified a cancer cohort of 12,806 patients from the University of Florida Health, diagnosed with lung, breast, and colorectal cancers.
The proposed narrative features remarkably increased feature density and improved performance.
arXiv Detail & Related papers (2024-03-18T02:42:01Z) - Cancer-Net BCa-S: Breast Cancer Grade Prediction using Volumetric Deep
Radiomic Features from Synthetic Correlated Diffusion Imaging [82.74877848011798]
The prevalence of breast cancer continues to grow, affecting about 300,000 females in the United States in 2023.
The gold-standard Scarff-Bloom-Richardson (SBR) grade has been shown to consistently indicate a patient's response to chemotherapy.
In this paper, we study the efficacy of deep learning for breast cancer grading based on synthetic correlated diffusion (CDI$s$) imaging.
arXiv Detail & Related papers (2023-04-12T15:08:34Z) - A Multi-Institutional Open-Source Benchmark Dataset for Breast Cancer
Clinical Decision Support using Synthetic Correlated Diffusion Imaging Data [82.74877848011798]
Cancer-Net BCa is a multi-institutional open-source benchmark dataset of volumetric CDI$s$ imaging data of breast cancer patients.
Cancer-Net BCa is publicly available as a part of a global open-source initiative dedicated to accelerating advancement in machine learning to aid clinicians in the fight against cancer.
arXiv Detail & Related papers (2023-04-12T05:41:44Z) - CancerUniT: Towards a Single Unified Model for Effective Detection,
Segmentation, and Diagnosis of Eight Major Cancers Using a Large Collection
of CT Scans [45.83431075462771]
Human readers or radiologists routinely perform full-body multi-organ multi-disease detection and diagnosis in clinical practice.
Most medical AI systems are built to focus on single organs with a narrow list of a few diseases.
CancerUniT is a query-based Mask Transformer model with the output of multi-tumor prediction.
arXiv Detail & Related papers (2023-01-28T20:09:34Z) - Improving Precancerous Case Characterization via Transformer-based
Ensemble Learning [31.891340667123124]
The application of natural language processing to cancer pathology reports has been focused on detecting cancer cases.
Improving the characterization of precancerous adenomas assists in developing diagnostic tests for early cancer detection and prevention.
Our results demonstrated the potential of using NLP to leverage real-world health record data to facilitate the development of diagnostic tests for early cancer prevention.
arXiv Detail & Related papers (2022-12-10T00:06:28Z) - Enhancing Clinical Support for Breast Cancer with Deep Learning Models
using Synthetic Correlated Diffusion Imaging [66.63200823918429]
We investigate enhancing clinical support for breast cancer with deep learning models.
We leverage a volumetric convolutional neural network to learn deep radiomic features from a pre-treatment cohort.
We find that the proposed approach can achieve better performance for both grade and post-treatment response prediction.
arXiv Detail & Related papers (2022-11-10T03:02:12Z) - Machine Learning-based Lung and Colon Cancer Detection using Deep
Feature Extraction and Ensemble Learning [0.9786690381850355]
We introduce a hybrid ensemble feature extraction model to efficiently identify lung and colon cancer.
It integrates deep feature extraction and ensemble learning with high-performance filtering for cancer image datasets.
Our model can detect lung, colon, and (lung and colon) cancer with accuracy rates of 99.05%, 100%, and 99.30%, respectively.
arXiv Detail & Related papers (2022-06-02T15:14:41Z) - CorrSigNet: Learning CORRelated Prostate Cancer SIGnatures from
Radiology and Pathology Images for Improved Computer Aided Diagnosis [1.63324350193061]
We propose CorrSigNet, an automated two-step model that localizes prostate cancer on MRI.
First, the model learns MRI signatures of cancer that are correlated with corresponding histopathology features.
Second, the model uses the learned correlated MRI features to train a Convolutional Neural Network to localize prostate cancer.
arXiv Detail & Related papers (2020-07-31T23:44:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.