RGE-GCN: Recursive Gene Elimination with Graph Convolutional Networks for RNA-seq based Early Cancer Detection
- URL: http://arxiv.org/abs/2512.04333v2
- Date: Sun, 07 Dec 2025 16:37:34 GMT
- Title: RGE-GCN: Recursive Gene Elimination with Graph Convolutional Networks for RNA-seq based Early Cancer Detection
- Authors: Shreyas Shende, Varsha Narayanan, Vishal Fenn, Yiran Huang, Dincer Goksuluk, Gaurav Choudhary, Melih Agraz, Mengjia Xu,
- Abstract summary: We introduce RGE-GCN (Recursive Gene Elimination with Graph Convolutional Networks), a framework that combines feature selection and classification in a single pipeline.<n>Our approach builds a graph from gene expression profiles, uses a Graph Convolutional Network to classify cancer versus normal samples, and applies Integrated Gradients to highlight the most informative genes.<n>We evaluated RGE-GCN on synthetic data as well as real-world RNA-seq cohorts of lung, kidney, and cervical cancers.
- Score: 3.1900467994236372
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Early detection of cancer plays a key role in improving survival rates, but identifying reliable biomarkers from RNA-seq data is still a major challenge. The data are high-dimensional, and conventional statistical methods often fail to capture the complex relationships between genes. In this study, we introduce RGE-GCN (Recursive Gene Elimination with Graph Convolutional Networks), a framework that combines feature selection and classification in a single pipeline. Our approach builds a graph from gene expression profiles, uses a Graph Convolutional Network to classify cancer versus normal samples, and applies Integrated Gradients to highlight the most informative genes. By recursively removing less relevant genes, the model converges to a compact set of biomarkers that are both interpretable and predictive. We evaluated RGE-GCN on synthetic data as well as real-world RNA-seq cohorts of lung, kidney, and cervical cancers. Across all datasets, the method consistently achieved higher accuracy and F1-scores than standard tools such as DESeq2, edgeR, and limma-voom. Importantly, the selected genes aligned with well-known cancer pathways including PI3K-AKT, MAPK, SUMOylation, and immune regulation. These results suggest that RGE-GCN shows promise as a generalizable approach for RNA-seq based early cancer detection and biomarker discovery (https://rce-gcn.streamlit.app/ ).
Related papers
- Hierarchical Pooling and Explainability in Graph Neural Networks for Tumor and Tissue-of-Origin Classification Using RNA-seq Data [0.19336815376402716]
We combine gene expression data from The Cancer Genome Atlas with a precomputed STRING protein-protein interaction network to classify tissue origin.<n>The model employs Chebyshev graph convolutions (K=2) and weighted pooling layers, aggregating gene clusters into'supernodes' across multiple coarsening levels.<n>Saliency methods were applied to interpret the model by identifying key genes and biological processes relevant to cancer.
arXiv Detail & Related papers (2026-01-10T01:33:56Z) - Genotype-Phenotype Integration through Machine Learning and Personalized Gene Regulatory Networks for Cancer Metastasis Prediction [1.7895738175272824]
Metastasis is the leading cause of cancer-related mortality.<n>We integrate classical machine learning and deep learning to predict metastatic potential across multiple cancer types.
arXiv Detail & Related papers (2025-10-22T05:20:13Z) - GRAPE: Heterogeneous Graph Representation Learning for Genetic Perturbation with Coding and Non-Coding Biotype [51.58774936662233]
Building gene regulatory networks (GRN) is essential to understand and predict the effects of genetic perturbations.<n>In this work, we leverage pre-trained large language model and DNA sequence model to extract features from gene descriptions and DNA sequence data.<n>We introduce gene biotype information for the first time in genetic perturbation, simulating the distinct roles of genes with different biotypes in regulating cellular processes.
arXiv Detail & Related papers (2025-05-06T03:35:24Z) - Learning to Discover Regulatory Elements for Gene Expression Prediction [59.470991831978516]
Seq2Exp is a Sequence to Expression network designed to discover and extract regulatory elements that drive target gene expression.<n>Our approach captures the causal relationship between epigenomic signals, DNA sequences and their associated regulatory elements.
arXiv Detail & Related papers (2025-02-19T03:25:49Z) - scMamba: A Pre-Trained Model for Single-Nucleus RNA Sequencing Analysis in Neurodegenerative Disorders [43.24785083027205]
scMamba is a pre-trained model designed to improve the quality and utility of snRNA-seq analysis.<n>Inspired by the recent Mamba model, scMamba introduces a novel architecture that incorporates a linear adapter layer, gene embeddings, and bidirectional Mamba blocks.<n>We demonstrate that scMamba outperforms benchmark methods in various downstream tasks, including cell type annotation, doublet detection, imputation, and the identification of differentially expressed genes.
arXiv Detail & Related papers (2025-02-12T11:48:22Z) - Graph Structure Learning for Tumor Microenvironment with Cell Type Annotation from non-spatial scRNA-seq data [6.432270457083369]
We present a novel graph neural network (GNN) model that enhances cell type prediction and cell interaction analysis.<n>The proposed scGSL model demonstrated robust performance, achieving an average accuracy of 84.83%, precision of 86.23%, recall of 81.51%, and an F1 score of 80.92% across all datasets.
arXiv Detail & Related papers (2025-02-04T18:28:25Z) - Pan-cancer gene set discovery via scRNA-seq for optimal deep learning based downstream tasks [6.869831177092736]
We analyzed scRNA-seq data from 181 tumor biopsies across 13 cancer types.
High-dimensional weighted gene co-expression network analysis (hdWGCNA) was performed to identify relevant gene sets.
Oncogenes from OncoKB evaluated with deep learning models, including multilayer perceptrons (MLPs) and graph neural networks (GNNs)
arXiv Detail & Related papers (2024-08-13T23:24:36Z) - scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis
in Brain [46.39828178736219]
We introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain.
scHyena is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a bidirectional Hyena operator.
This enables us to process full-length scRNA-seq data without losing any information from the raw data.
arXiv Detail & Related papers (2023-10-04T10:30:08Z) - Machine Learning Methods for Cancer Classification Using Gene Expression
Data: A Review [77.34726150561087]
Cancer is the second major cause of death after cardiovascular diseases.
Gene expression can play a fundamental role in the early detection of cancer.
This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods.
arXiv Detail & Related papers (2023-01-28T15:03:03Z) - Explainable Multilayer Graph Neural Network for Cancer Gene Prediction [21.83218536069088]
We introduce an Explainable Multilayer Graph Neural Network (EMGNN) approach to identify cancer genes.
Unlike conventional graph learning on a single biological network, EMGNN uses a multilayered graph neural network to learn from multiple biological networks for accurate cancer gene prediction.
Our method consistently outperforms all existing methods, with an average 7.15% improvement in area under the precision-recall curve (AUPR) over the current state-of-the-art method.
arXiv Detail & Related papers (2023-01-20T23:57:12Z) - rfPhen2Gen: A machine learning based association study of brain imaging
phenotypes to genotypes [71.1144397510333]
We learned machine learning models to predict SNPs using 56 brain imaging QTs.
SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest.
Random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders.
arXiv Detail & Related papers (2022-03-31T20:15:22Z) - Lung Cancer Lesion Detection in Histopathology Images Using Graph-Based
Sparse PCA Network [93.22587316229954]
We propose a graph-based sparse principal component analysis (GS-PCA) network, for automated detection of cancerous lesions on histological lung slides stained by hematoxylin and eosin (H&E)
We evaluate the performance of the proposed algorithm on H&E slides obtained from an SVM K-rasG12D lung cancer mouse model using precision/recall rates, F-score, Tanimoto coefficient, and area under the curve (AUC) of the receiver operator characteristic (ROC)
arXiv Detail & Related papers (2021-10-27T19:28:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.