Related papers: CancerLLM: A Large Language Model in Cancer Domain

CancerLLM: A Large Language Model in Cancer Domain

URL: http://arxiv.org/abs/2406.10459v1
Date: Sat, 15 Jun 2024 01:02:48 GMT
Title: CancerLLM: A Large Language Model in Cancer Domain
Authors: Mingchen Li, Anne Blaes, Steven Johnson, Hongfang Liu, Hua Xu, Rui Zhang,
Abstract summary: CancerLLM is a model with 7 billion parameters and a Mistral-style architecture, pre-trained on 2,676,642 clinical notes and 515,524 pathology reports covering 17 cancer types. Our evaluation demonstrated that CancerLLM achieves state-of-the-art results compared to other existing LLMs, with an average F1 score improvement of 8.1%.
Score: 19.384643526294127
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Medical Large Language Models (LLMs) such as ClinicalCamel 70B, Llama3-OpenBioLLM 70B have demonstrated impressive performance on a wide variety of medical NLP task.However, there still lacks a large language model (LLM) specifically designed for cancer domain. Moreover, these LLMs typically have billions of parameters, making them computationally expensive for healthcare systems.Thus, in this study, we propose CancerLLM, a model with 7 billion parameters and a Mistral-style architecture, pre-trained on 2,676,642 clinical notes and 515,524 pathology reports covering 17 cancer types, followed by fine-tuning on three cancer-relevant tasks, including cancer phenotypes extraction, cancer diagnosis generation, and cancer treatment plan generation. Our evaluation demonstrated that CancerLLM achieves state-of-the-art results compared to other existing LLMs, with an average F1 score improvement of 8.1\%. Additionally, CancerLLM outperforms other models on two proposed robustness testbeds. This illustrates that CancerLLM can be effectively applied to clinical AI systems, enhancing clinical research and healthcare delivery in the field of cancer.

Related papers

ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification [57.22053411719822]
ChestX-Reasoner is a radiology diagnosis MLLM designed to leverage process supervision mined directly from clinical reports. Our two-stage training framework combines supervised fine-tuning and reinforcement learning guided by process rewards to better align model reasoning with clinical standards.
arXiv Detail & Related papers (2025-04-29T16:48:23Z)
Cancer-Net PCa-Seg: Benchmarking Deep Learning Models for Prostate Cancer Segmentation Using Synthetic Correlated Diffusion Imaging [65.83291923029985]
Prostate cancer (PCa) is the most prevalent cancer among men in the United States, accounting for nearly 300,000 cases, 29% of all diagnoses and 35,000 total deaths in 2024. Traditional screening methods such as prostate-specific antigen (PSA) testing and magnetic resonance imaging (MRI) have been pivotal in diagnosis, but have faced limitations in specificity and generalizability. We employ several state-of-the-art deep learning models, including U-Net, SegResNet, Swin UNETR, Attention U-Net, and LightM-UNet, to segment PCa lesions from a 200 CDI$
arXiv Detail & Related papers (2025-01-15T22:23:41Z)
Towards Non-invasive and Personalized Management of Breast Cancer Patients from Multiparametric MRI via A Large Mixture-of-Modality-Experts Model [19.252851972152957]
We report a mixture-of-modality-experts model (MOME) that integrates multiparametric MRI information within a unified structure. MOME demonstrated accurate and robust identification of breast cancer. It could reduce the need for biopsies in BI-RADS 4 patients with a ratio of 7.3%, classify triple-negative breast cancer with an AUROC of 0.709, and predict pathological complete response to neoadjuvant chemotherapy with an AUROC of 0.694.
arXiv Detail & Related papers (2024-08-08T05:04:13Z)
A Large Language Model Pipeline for Breast Cancer Oncology [0.0]
State-of-the-art OpenAI models were fine-tuned on a clinical dataset and clinical guidelines text corpus for two important cancer treatment factors. A high accuracy (0.85+) was achieved in the classification of adjuvant radiation therapy and chemotherapy for breast cancer patients.
arXiv Detail & Related papers (2024-06-10T16:44:48Z)
Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports [68.39938936308023]
We propose a novel text-guided learning method to achieve highly accurate cancer detection results. Our approach can leverage clinical knowledge by large-scale pre-trained VLM to enhance generalization ability.
arXiv Detail & Related papers (2024-05-23T07:03:38Z)
Improving Breast Cancer Grade Prediction with Multiparametric MRI Created Using Optimized Synthetic Correlated Diffusion Imaging [71.91773485443125]
Grading plays a vital role in breast cancer treatment planning. The current tumor grading method involves extracting tissue from patients, leading to stress, discomfort, and high medical costs. This paper examines using optimized CDI$s$ to improve breast cancer grade prediction.
arXiv Detail & Related papers (2024-05-13T15:48:26Z)
Narrative Feature or Structured Feature? A Study of Large Language Models to Identify Cancer Patients at Risk of Heart Failure [21.660602700862714]
This study examined machine learning models to identify cancer patients at risk of heart failure. We identified a cancer cohort of 12,806 patients from the University of Florida Health, diagnosed with lung, breast, and colorectal cancers. The proposed narrative features remarkably increased feature density and improved performance.
arXiv Detail & Related papers (2024-03-18T02:42:01Z)
Cancer-Net BCa-S: Breast Cancer Grade Prediction using Volumetric Deep Radiomic Features from Synthetic Correlated Diffusion Imaging [82.74877848011798]
The prevalence of breast cancer continues to grow, affecting about 300,000 females in the United States in 2023. The gold-standard Scarff-Bloom-Richardson (SBR) grade has been shown to consistently indicate a patient's response to chemotherapy. In this paper, we study the efficacy of deep learning for breast cancer grading based on synthetic correlated diffusion (CDI$s$) imaging.
arXiv Detail & Related papers (2023-04-12T15:08:34Z)
A Multi-Institutional Open-Source Benchmark Dataset for Breast Cancer Clinical Decision Support using Synthetic Correlated Diffusion Imaging Data [82.74877848011798]
Cancer-Net BCa is a multi-institutional open-source benchmark dataset of volumetric CDI$s$ imaging data of breast cancer patients. Cancer-Net BCa is publicly available as a part of a global open-source initiative dedicated to accelerating advancement in machine learning to aid clinicians in the fight against cancer.
arXiv Detail & Related papers (2023-04-12T05:41:44Z)
CancerUniT: Towards a Single Unified Model for Effective Detection, Segmentation, and Diagnosis of Eight Major Cancers Using a Large Collection of CT Scans [45.83431075462771]
Human readers or radiologists routinely perform full-body multi-organ multi-disease detection and diagnosis in clinical practice. Most medical AI systems are built to focus on single organs with a narrow list of a few diseases. CancerUniT is a query-based Mask Transformer model with the output of multi-tumor prediction.
arXiv Detail & Related papers (2023-01-28T20:09:34Z)
Improving Precancerous Case Characterization via Transformer-based Ensemble Learning [31.891340667123124]
The application of natural language processing to cancer pathology reports has been focused on detecting cancer cases. Improving the characterization of precancerous adenomas assists in developing diagnostic tests for early cancer detection and prevention. Our results demonstrated the potential of using NLP to leverage real-world health record data to facilitate the development of diagnostic tests for early cancer prevention.
arXiv Detail & Related papers (2022-12-10T00:06:28Z)
Enhancing Clinical Support for Breast Cancer with Deep Learning Models using Synthetic Correlated Diffusion Imaging [66.63200823918429]
We investigate enhancing clinical support for breast cancer with deep learning models. We leverage a volumetric convolutional neural network to learn deep radiomic features from a pre-treatment cohort. We find that the proposed approach can achieve better performance for both grade and post-treatment response prediction.
arXiv Detail & Related papers (2022-11-10T03:02:12Z)
Machine Learning-based Lung and Colon Cancer Detection using Deep Feature Extraction and Ensemble Learning [0.9786690381850355]
We introduce a hybrid ensemble feature extraction model to efficiently identify lung and colon cancer. It integrates deep feature extraction and ensemble learning with high-performance filtering for cancer image datasets. Our model can detect lung, colon, and (lung and colon) cancer with accuracy rates of 99.05%, 100%, and 99.30%, respectively.
arXiv Detail & Related papers (2022-06-02T15:14:41Z)
CorrSigNet: Learning CORRelated Prostate Cancer SIGnatures from Radiology and Pathology Images for Improved Computer Aided Diagnosis [1.63324350193061]
We propose CorrSigNet, an automated two-step model that localizes prostate cancer on MRI. First, the model learns MRI signatures of cancer that are correlated with corresponding histopathology features. Second, the model uses the learned correlated MRI features to train a Convolutional Neural Network to localize prostate cancer.
arXiv Detail & Related papers (2020-07-31T23:44:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.