Gene-MOE: A sparsely gated prognosis and classification framework
exploiting pan-cancer genomic information
- URL: http://arxiv.org/abs/2311.17401v3
- Date: Mon, 18 Dec 2023 12:37:17 GMT
- Title: Gene-MOE: A sparsely gated prognosis and classification framework
exploiting pan-cancer genomic information
- Authors: Xiangyu Meng, Xue Li, Qing Yang, Huanhuan Dai, Lian Qiao, Hongzhen
Ding, Long Hao and Xun Wang
- Abstract summary: We introduce a novel sparsely gated RNA-seq analysis framework called Gene-MOE.
Gene-MOE exploits the potential of the MOE layers and the proposed mixture of attention expert layers to enhance the analysis accuracy.
It addresses overfitting challenges by integrating pan-cancer information from 33 distinct cancer types through pre-training.
- Score: 13.57379781623848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Benefiting from the advancements in deep learning, various genomic analytical
techniques, such as survival analysis, classification of tumors and their
subtypes, and exploration of specific pathways, have significantly enhanced our
understanding of the biological mechanisms driving cancer. However, the
overfitting issue, arising from the limited number of patient samples, poses a
challenge in improving the accuracy of genome analysis by deepening the neural
network. Furthermore, it remains uncertain whether novel approaches such as the
sparsely gated mixture of expert (MOE) and self-attention mechanisms can
improve the accuracy of genomic analysis. In this paper, we introduce a novel
sparsely gated RNA-seq analysis framework called Gene-MOE. This framework
exploits the potential of the MOE layers and the proposed mixture of attention
expert (MOAE) layers to enhance the analysis accuracy. Additionally, it
addresses overfitting challenges by integrating pan-cancer information from 33
distinct cancer types through pre-training.We pre-trained Gene-MOE on TCGA
pan-cancer RNA-seq dataset with 33 cancer types. Subsequently, we conducted
experiments involving cancer classification and survival analysis based on the
pre-trained Gene-MOE. According to the survival analysis results on 14 cancer
types, Gene-MOE outperformed state-of-the-art models on 12 cancer types.
Through detailed feature analysis, we found that the Gene-MOE model could learn
rich feature representations of high-dimensional genes. According to the
classification results, the total accuracy of the classification model for 33
cancer classifications reached 95.8%, representing the best performance
compared to state-of-the-art models. These results indicate that Gene-MOE holds
strong potential for use in cancer classification and survival analysis.
Related papers
- Precision Cancer Classification and Biomarker Identification from mRNA Gene Expression via Dimensionality Reduction and Explainable AI [0.9423257767158634]
This research presents a comprehensive pipeline designed to accurately identify 33 distinct cancer types and their corresponding gene sets.
It incorporates a combination of normalization and feature selection techniques to reduce dataset dimensionality effectively.
We leverage Explainable AI to elucidate the biological significance of the identified cancer-specific genes.
arXiv Detail & Related papers (2024-10-08T18:56:31Z) - Pan-cancer gene set discovery via scRNA-seq for optimal deep learning based downstream tasks [6.869831177092736]
We analyzed scRNA-seq data from 181 tumor biopsies across 13 cancer types.
High-dimensional weighted gene co-expression network analysis (hdWGCNA) was performed to identify relevant gene sets.
Oncogenes from OncoKB evaluated with deep learning models, including multilayer perceptrons (MLPs) and graph neural networks (GNNs)
arXiv Detail & Related papers (2024-08-13T23:24:36Z) - Self-Normalizing Foundation Model for Enhanced Multi-Omics Data Analysis in Oncology [0.0]
SeNMo is a foundation model that has been trained on multi-omics data across 33 cancer types.
We trained SeNMo for the task of overall survival of patients using pan-cancer multi-omics data involving 33 cancer sites.
SeNMo was validated on two independent cohorts: Moffitt Cancer Center and CPTAC lung squamous cell carcinoma.
arXiv Detail & Related papers (2024-05-13T22:45:44Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Machine Learning Methods for Cancer Classification Using Gene Expression
Data: A Review [77.34726150561087]
Cancer is the second major cause of death after cardiovascular diseases.
Gene expression can play a fundamental role in the early detection of cancer.
This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods.
arXiv Detail & Related papers (2023-01-28T15:03:03Z) - Pan-Cancer Integrative Histology-Genomic Analysis via Interpretable
Multimodal Deep Learning [4.764927152701701]
We integrate whole slide pathology images, RNA-seq abundance, copy number variation, and mutation data from 5,720 patients across 14 major cancer types.
Our interpretable, weakly-supervised, multimodal deep learning algorithm is able to fuse these heterogeneous modalities for predicting outcomes.
We analyze morphologic and molecular markers responsible for prognostic predictions across all cancer types.
arXiv Detail & Related papers (2021-08-04T20:40:05Z) - Cancer Gene Profiling through Unsupervised Discovery [49.28556294619424]
We introduce a novel, automatic and unsupervised framework to discover low-dimensional gene biomarkers.
Our method is based on the LP-Stability algorithm, a high dimensional center-based unsupervised clustering algorithm.
Our signature reports promising results on distinguishing immune inflammatory and immune desert tumors.
arXiv Detail & Related papers (2021-02-11T09:04:45Z) - Topological Data Analysis of copy number alterations in cancer [70.85487611525896]
We explore the potential to capture information contained in cancer genomic information using a novel topology-based approach.
We find that this technique has the potential to extract meaningful low-dimensional representations in cancer somatic genetic data.
arXiv Detail & Related papers (2020-11-22T17:31:23Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - The scalable Birth-Death MCMC Algorithm for Mixed Graphical Model
Learning with Application to Genomic Data Integration [0.0]
We propose a novel mixed graphical model approach to analyze multi-omic data of different types.
We find that our method is superior in terms of both computational efficiency and the accuracy of the model selection results.
arXiv Detail & Related papers (2020-05-08T16:34:58Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.