Self-Normalizing Foundation Model for Enhanced Multi-Omics Data Analysis in Oncology
- URL: http://arxiv.org/abs/2405.08226v2
- Date: Sun, 03 Nov 2024 17:07:58 GMT
- Title: Self-Normalizing Foundation Model for Enhanced Multi-Omics Data Analysis in Oncology
- Authors: Asim Waqas, Aakash Tripathi, Sabeen Ahmed, Ashwin Mukund, Hamza Farooq, Matthew B. Schabath, Paul Stewart, Mia Naeini, Ghulam Rasool,
- Abstract summary: SeNMo is a foundation model that has been trained on multi-omics data across 33 cancer types.
We trained SeNMo for the task of overall survival of patients using pan-cancer multi-omics data involving 33 cancer sites.
SeNMo was validated on two independent cohorts: Moffitt Cancer Center and CPTAC lung squamous cell carcinoma.
- Score: 0.0
- License:
- Abstract: Multi-omics research has enhanced our understanding of cancer heterogeneity and progression. Investigating molecular data through multi-omics approaches is crucial for unraveling the complex biological mechanisms underlying cancer, thereby enabling more effective diagnosis, treatment, and prevention strategies. However, predicting patient outcomes through the integration of all available multi-omics data is still an under-study research direction. Here, we present SeNMo, a foundation model that has been trained on multi-omics data across 33 cancer types. SeNMo is particularly efficient in handling multi-omics data characterized by high-width and low-length attributes. We trained SeNMo for the task of overall survival of patients using pan-cancer multi-omics data involving 33 cancer sites from the GDC. The training multi-omics data includes gene expression, DNA methylation, miRNA expression, DNA mutations, protein expression modalities, and clinical data. SeNMo was validated on two independent cohorts: Moffitt Cancer Center and CPTAC lung squamous cell carcinoma. We evaluated the model's performance in predicting patient's overall survival using the C-Index. SeNMo performed consistently well in the training regime, reflected by the validation C-Index of 0.76 on GDC's public data. In the testing regime, SeNMo performed with a C-Index of 0.758 on a held-out test set. The model showed an average accuracy of 99.8% on the task of classifying the primary cancer type on the pan-cancer test cohort. SeNMo demonstrated robust performance on the classification task of predicting the primary cancer type of patients. SeNMo further demonstrated significant performance in predicting tertiary lymph structures from multi-omics data, showing generalizability across cancer types, molecular data types, and clinical endpoints.
Related papers
- Multi-modal Medical Image Fusion For Non-Small Cell Lung Cancer Classification [7.002657345547741]
Non-small cell lung cancer (NSCLC) is a predominant cause of cancer mortality worldwide.
In this paper, we introduce an innovative integration of multi-modal data, synthesizing fused medical imaging (CT and PET scans) with clinical health records and genomic data.
Our research surpasses existing approaches, as evidenced by a substantial enhancement in NSCLC detection and classification precision.
arXiv Detail & Related papers (2024-09-27T12:59:29Z) - LASSO-MOGAT: A Multi-Omics Graph Attention Framework for Cancer Classification [41.94295877935867]
This paper introduces LASSO-MOGAT, a graph-based deep learning framework that integrates messenger RNA, microRNA, and DNA methylation data to classify 31 cancer types.
arXiv Detail & Related papers (2024-08-30T16:26:04Z) - Embedding-based Multimodal Learning on Pan-Squamous Cell Carcinomas for Improved Survival Outcomes [0.0]
PARADIGM is a framework that learns from multimodal, heterogeneous datasets to improve clinical outcome prediction.
We train GNNs on pan-Squamous Cell Carcinomas and validate our approach on Moffitt Cancer Center lung SCC data.
Our solution aims to understand the patient's circumstances comprehensively, offering insights on heterogeneous data integration and the benefits of converging maximum data views.
arXiv Detail & Related papers (2024-06-11T22:19:14Z) - Cancer-Net PCa-Gen: Synthesis of Realistic Prostate Diffusion Weighted
Imaging Data via Anatomic-Conditional Controlled Latent Diffusion [68.45407109385306]
In Canada, prostate cancer is the most common form of cancer in men and accounted for 20% of new cancer cases for this demographic in 2022.
There has been significant interest in the development of deep neural networks for prostate cancer diagnosis, prognosis, and treatment planning using diffusion weighted imaging (DWI) data.
In this study, we explore the efficacy of latent diffusion for generating realistic prostate DWI data through the introduction of an anatomic-conditional controlled latent diffusion strategy.
arXiv Detail & Related papers (2023-11-30T15:11:03Z) - Gene-MOE: A sparsely gated prognosis and classification framework
exploiting pan-cancer genomic information [13.57379781623848]
We introduce a novel sparsely gated RNA-seq analysis framework called Gene-MOE.
Gene-MOE exploits the potential of the MOE layers and the proposed mixture of attention expert layers to enhance the analysis accuracy.
It addresses overfitting challenges by integrating pan-cancer information from 33 distinct cancer types through pre-training.
arXiv Detail & Related papers (2023-11-29T07:09:25Z) - Cancer-Net PCa-Data: An Open-Source Benchmark Dataset for Prostate
Cancer Clinical Decision Support using Synthetic Correlated Diffusion Imaging
Data [75.77035221531261]
Cancer-Net PCa-Data is an open-source benchmark dataset of volumetric CDI$s$ imaging data of PCa patients.
Cancer-Net PCa-Data is the first-ever public dataset of CDI$s$ imaging data for PCa.
arXiv Detail & Related papers (2023-11-20T10:28:52Z) - Pathology-and-genomics Multimodal Transformer for Survival Outcome
Prediction [43.1748594898772]
We propose a multimodal transformer (PathOmics) integrating pathology and genomics insights into colon-related cancer survival prediction.
We emphasize the unsupervised pretraining to capture the intrinsic interaction between tissue microenvironments in gigapixel whole slide images.
We evaluate our approach on both TCGA colon and rectum cancer cohorts, showing that the proposed approach is competitive and outperforms state-of-the-art studies.
arXiv Detail & Related papers (2023-07-22T00:59:26Z) - Deep Orthogonal Fusion: Multimodal Prognostic Biomarker Discovery
Integrating Radiology, Pathology, Genomic, and Clinical Data [0.32622301272834525]
We predict the overall survival (OS) of glioma patients from diverse multimodal data with a Deep Orthogonal Fusion model.
The model learns to combine information from MRI exams, biopsy-based modalities, and clinical variables into a comprehensive multimodal risk score.
It significantly stratifies glioma patients by OS within clinical subsets, adding further granularity to prognostic clinical grading and molecular subtyping.
arXiv Detail & Related papers (2021-07-01T17:59:01Z) - Topological Data Analysis of copy number alterations in cancer [70.85487611525896]
We explore the potential to capture information contained in cancer genomic information using a novel topology-based approach.
We find that this technique has the potential to extract meaningful low-dimensional representations in cancer somatic genetic data.
arXiv Detail & Related papers (2020-11-22T17:31:23Z) - The scalable Birth-Death MCMC Algorithm for Mixed Graphical Model
Learning with Application to Genomic Data Integration [0.0]
We propose a novel mixed graphical model approach to analyze multi-omic data of different types.
We find that our method is superior in terms of both computational efficiency and the accuracy of the model selection results.
arXiv Detail & Related papers (2020-05-08T16:34:58Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.