Contrastive Learning for Predicting Cancer Prognosis Using Gene Expression Values
- URL: http://arxiv.org/abs/2306.06276v4
- Date: Thu, 16 May 2024 22:31:37 GMT
- Title: Contrastive Learning for Predicting Cancer Prognosis Using Gene Expression Values
- Authors: Anchen Sun, Elizabeth J. Franzmann, Zhibin Chen, Xiaodong Cai,
- Abstract summary: We train a classifier to categorize tumors into a high- or low-risk group of recurrence.
Our CL-based classifiers achieved an area under the receiver operating characteristic curve (AUC) greater than 0.8 for 14 types of cancer.
Our CLCox models trained with the TCGA data outperformed existing methods significantly in predicting the prognosis of 19 types of cancer.
- Score: 3.1298840947078372
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent advancements in image classification have demonstrated that contrastive learning (CL) can aid in further learning tasks by acquiring good feature representation from a limited number of data samples. In this paper, we applied CL to tumor transcriptomes and clinical data to learn feature representations in a low-dimensional space. We then utilized these learned features to train a classifier to categorize tumors into a high- or low-risk group of recurrence. Using data from The Cancer Genome Atlas (TCGA), we demonstrated that CL can significantly improve classification accuracy. Specifically, our CL-based classifiers achieved an area under the receiver operating characteristic curve (AUC) greater than 0.8 for 14 types of cancer, and an AUC greater than 0.9 for 2 types of cancer. We also developed CL-based Cox (CLCox) models for predicting cancer prognosis. Our CLCox models trained with the TCGA data outperformed existing methods significantly in predicting the prognosis of 19 types of cancer under consideration. The performance of CLCox models and CL-based classifiers trained with TCGA lung and prostate cancer data were validated using the data from two independent cohorts. We also show that the CLCox model trained with the whole transcriptome significantly outperforms the Cox model trained with the 21 genes of Oncotype DX that is in clinical use for breast cancer patients. CL-based classifiers and CLCox models for 19 types of cancer are publicly available and can be used to predict cancer prognosis using the RNA-seq transcriptome of an individual tumor. Python codes for model training and testing are also publicly accessible, and can be applied to train new CL-based models using gene expression data of tumors.
Related papers
- A Novel cVAE-Augmented Deep Learning Framework for Pan-Cancer RNA-Seq Classification [0.0]
We propose a novel deep learning framework that uses a class-conditional variational autoencoder (cVAE) to augment training data for pan-cancer gene expression classification.<n>We present detailed experimental results, including VAE training curves, performance metrics (ROC curves and confusion matrix), and architecture diagrams.
arXiv Detail & Related papers (2025-08-02T16:57:31Z) - CXR-LT 2024: A MICCAI challenge on long-tailed, multi-label, and zero-shot disease classification from chest X-ray [64.2434525370243]
The CXR-LT series is a community-driven initiative designed to enhance lung disease classification using chest X-rays.<n>The CXR-LT 2024 expands the dataset to 377,110 chest X-rays (CXRs) and 45 disease labels, including 19 new rare disease findings.<n>This paper provides an overview of CXR-LT 2024, detailing the data curation process and consolidating state-of-the-art solutions.
arXiv Detail & Related papers (2025-06-09T17:53:31Z) - Cancer-Net PCa-Seg: Benchmarking Deep Learning Models for Prostate Cancer Segmentation Using Synthetic Correlated Diffusion Imaging [65.83291923029985]
Prostate cancer (PCa) is the most prevalent cancer among men in the United States, accounting for nearly 300,000 cases, 29% of all diagnoses and 35,000 total deaths in 2024.
Traditional screening methods such as prostate-specific antigen (PSA) testing and magnetic resonance imaging (MRI) have been pivotal in diagnosis, but have faced limitations in specificity and generalizability.
We employ several state-of-the-art deep learning models, including U-Net, SegResNet, Swin UNETR, Attention U-Net, and LightM-UNet, to segment PCa lesions from a 200 CDI$
arXiv Detail & Related papers (2025-01-15T22:23:41Z) - Lung-CADex: Fully automatic Zero-Shot Detection and Classification of Lung Nodules in Thoracic CT Images [45.29301790646322]
Computer-aided diagnosis can help with early lung nodul detection and facilitate subsequent nodule characterization.
We propose CADe, for segmenting lung nodules in a zero-shot manner using a variant of the Segment Anything Model called MedSAM.
We also propose, CADx, a method for the nodule characterization as benign/malignant by making a gallery of radiomic features and aligning image-feature pairs through contrastive learning.
arXiv Detail & Related papers (2024-07-02T19:30:25Z) - Fairness Evolution in Continual Learning for Medical Imaging [47.52603262576663]
We study the behavior of Continual Learning (CL) strategies in medical imaging regarding classification performance.
We evaluate the Replay, Learning without Forgetting (LwF), LwF, and Pseudo-Label strategies.
LwF and Pseudo-Label exhibit optimal classification performance, but when including fairness metrics in the evaluation, it is clear that Pseudo-Label is less biased.
arXiv Detail & Related papers (2024-04-10T09:48:52Z) - CIMIL-CRC: a clinically-informed multiple instance learning framework for patient-level colorectal cancer molecular subtypes classification from H\&E stained images [42.771819949806655]
We introduce CIMIL-CRC', a framework that solves the MSI/MSS MIL problem by efficiently combining a pre-trained feature extraction model with principal component analysis (PCA) to aggregate information from all patches.
We assessed our CIMIL-CRC method using the average area under the curve (AUC) from a 5-fold cross-validation experimental setup for model development on the TCGA-CRC-DX cohort.
arXiv Detail & Related papers (2024-01-29T12:56:11Z) - A Multi-Institutional Open-Source Benchmark Dataset for Breast Cancer
Clinical Decision Support using Synthetic Correlated Diffusion Imaging Data [82.74877848011798]
Cancer-Net BCa is a multi-institutional open-source benchmark dataset of volumetric CDI$s$ imaging data of breast cancer patients.
Cancer-Net BCa is publicly available as a part of a global open-source initiative dedicated to accelerating advancement in machine learning to aid clinicians in the fight against cancer.
arXiv Detail & Related papers (2023-04-12T05:41:44Z) - Machine Learning Methods for Cancer Classification Using Gene Expression
Data: A Review [77.34726150561087]
Cancer is the second major cause of death after cardiovascular diseases.
Gene expression can play a fundamental role in the early detection of cancer.
This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods.
arXiv Detail & Related papers (2023-01-28T15:03:03Z) - Improving Precancerous Case Characterization via Transformer-based
Ensemble Learning [31.891340667123124]
The application of natural language processing to cancer pathology reports has been focused on detecting cancer cases.
Improving the characterization of precancerous adenomas assists in developing diagnostic tests for early cancer detection and prevention.
Our results demonstrated the potential of using NLP to leverage real-world health record data to facilitate the development of diagnostic tests for early cancer prevention.
arXiv Detail & Related papers (2022-12-10T00:06:28Z) - Gene selection from microarray expression data: A Multi-objective PSO
with adaptive K-nearest neighborhood [0.0]
This paper deals with the classification problem of human cancer diseases by using gene expression data.
It is presented a new methodology to analyze microarray datasets and efficiently classify cancer diseases.
arXiv Detail & Related papers (2022-05-27T04:22:10Z) - Deep Learning Based Model for Breast Cancer Subtype Classification [3.419451872918847]
This paper focuses on the use of gene expression data for the classification of breast cancer into four subtypes, Basal, Her2, LumA, and LumB.
The size of the feature set is reduced from 20,530 gene expression values to 500 by using an autoencoder.
By deploying the combined network of stages 1 and 2, we have been able to attain a mean 10-fold test accuracy of 0.907 on the TCGA breast cancer dataset.
arXiv Detail & Related papers (2021-11-06T17:15:35Z) - Lung Cancer Lesion Detection in Histopathology Images Using Graph-Based
Sparse PCA Network [93.22587316229954]
We propose a graph-based sparse principal component analysis (GS-PCA) network, for automated detection of cancerous lesions on histological lung slides stained by hematoxylin and eosin (H&E)
We evaluate the performance of the proposed algorithm on H&E slides obtained from an SVM K-rasG12D lung cancer mouse model using precision/recall rates, F-score, Tanimoto coefficient, and area under the curve (AUC) of the receiver operator characteristic (ROC)
arXiv Detail & Related papers (2021-10-27T19:28:36Z) - Topological Data Analysis of copy number alterations in cancer [70.85487611525896]
We explore the potential to capture information contained in cancer genomic information using a novel topology-based approach.
We find that this technique has the potential to extract meaningful low-dimensional representations in cancer somatic genetic data.
arXiv Detail & Related papers (2020-11-22T17:31:23Z) - Machine Learning Against Cancer: Accurate Diagnosis of Cancer by Machine
Learning Classification of the Whole Genome Sequencing Data [0.0]
We have developed novel methods of MLAC (Machine Learning Against Cancer) achieving perfect results with perfect precision, sensitivity, and specificity.
We have used the whole genome sequencing data acquired by next-generation RNA sequencing techniques in The Cancer Genome Atlas and Genotype-Tissue Expression projects for cancerous and healthy tissues respectively.
arXiv Detail & Related papers (2020-09-12T18:51:47Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z) - Iteratively Pruned Deep Learning Ensembles for COVID-19 Detection in
Chest X-rays [3.785818062712446]
This disease is caused by the novel Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) virus.
A custom convolutional neural network and a selection of ImageNet pretrained models are trained and evaluated at patient-level.
The learned knowledge is transferred and fine-tuned to improve performance and generalization.
arXiv Detail & Related papers (2020-04-16T00:09:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.