Biomarker Gene Identification for Breast Cancer Classification
- URL: http://arxiv.org/abs/2111.05546v1
- Date: Wed, 10 Nov 2021 06:38:50 GMT
- Title: Biomarker Gene Identification for Breast Cancer Classification
- Authors: Sheetal Rajpal, Ankit Rajpal, Manoj Agarwal, Naveen Kumar
- Abstract summary: The present work uses interpretable predictions made by the deep neural network employed for subtype classification to identify biomarkers.
The proposed algorithm led to the discovery of a set of 43 differentially expressed gene signatures.
- Score: 2.403531305046943
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: BACKGROUND: Breast cancer has emerged as one of the most prevalent cancers
among women leading to a high mortality rate. Due to the heterogeneous nature
of breast cancer, there is a need to identify differentially expressed genes
associated with breast cancer subtypes for its timely diagnosis and treatment.
OBJECTIVE: To identify a small gene set for each of the four breast cancer
subtypes that could act as its signature, the paper proposes a novel algorithm
for gene signature identification. METHODS: The present work uses interpretable
AI methods to investigate the predictions made by the deep neural network
employed for subtype classification to identify biomarkers using the TCGA
breast cancer RNA Sequence data. RESULTS: The proposed algorithm led to the
discovery of a set of 43 differentially expressed gene signatures. We achieved
a competitive average 10-fold accuracy of 0.91, using neural network
classifier. Further, gene set analysis revealed several relevant pathways, such
as GRB7 events in ERBB2 and p53 signaling pathway. Using the Pearson
correlation matrix, we noted that the subtype-specific genes are correlated
within each subtype. CONCLUSIONS: The proposed technique enables us to find a
concise and clinically relevant gene signature set.
Related papers
- A Bioinformatic Approach Validated Utilizing Machine Learning Algorithms to Identify Relevant Biomarkers and Crucial Pathways in Gallbladder Cancer [2.3087284629747766]
Gallbladder cancer (GBC) is the most frequent cause of disease among biliary tract neoplasms.
Few recent studies have explored the roles of biomarkers in GBC.
We used machine learning (ML) and bioinformatics techniques to identify biomarkers in GBC.
arXiv Detail & Related papers (2024-10-18T12:51:19Z) - Precision Cancer Classification and Biomarker Identification from mRNA Gene Expression via Dimensionality Reduction and Explainable AI [0.9423257767158634]
This research presents a comprehensive pipeline designed to accurately identify 33 distinct cancer types and their corresponding gene sets.
It incorporates a combination of normalization and feature selection techniques to reduce dataset dimensionality effectively.
We leverage Explainable AI to elucidate the biological significance of the identified cancer-specific genes.
arXiv Detail & Related papers (2024-10-08T18:56:31Z) - Pan-cancer gene set discovery via scRNA-seq for optimal deep learning based downstream tasks [6.869831177092736]
We analyzed scRNA-seq data from 181 tumor biopsies across 13 cancer types.
High-dimensional weighted gene co-expression network analysis (hdWGCNA) was performed to identify relevant gene sets.
Oncogenes from OncoKB evaluated with deep learning models, including multilayer perceptrons (MLPs) and graph neural networks (GNNs)
arXiv Detail & Related papers (2024-08-13T23:24:36Z) - Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances.
BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules.
BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z) - Machine Learning Methods for Cancer Classification Using Gene Expression
Data: A Review [77.34726150561087]
Cancer is the second major cause of death after cardiovascular diseases.
Gene expression can play a fundamental role in the early detection of cancer.
This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods.
arXiv Detail & Related papers (2023-01-28T15:03:03Z) - rfPhen2Gen: A machine learning based association study of brain imaging
phenotypes to genotypes [71.1144397510333]
We learned machine learning models to predict SNPs using 56 brain imaging QTs.
SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest.
Random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders.
arXiv Detail & Related papers (2022-03-31T20:15:22Z) - Deep Learning Based Model for Breast Cancer Subtype Classification [3.419451872918847]
This paper focuses on the use of gene expression data for the classification of breast cancer into four subtypes, Basal, Her2, LumA, and LumB.
The size of the feature set is reduced from 20,530 gene expression values to 500 by using an autoencoder.
By deploying the combined network of stages 1 and 2, we have been able to attain a mean 10-fold test accuracy of 0.907 on the TCGA breast cancer dataset.
arXiv Detail & Related papers (2021-11-06T17:15:35Z) - DeepGene Transformer: Transformer for the gene expression-based classification of cancer subtypes [5.179504118679301]
Cancer and its subtypes constitute approximately 30% of all causes of death globally.
DeepGene Transformer is proposed which addresses the complexity of high-dimensional gene expression with a multi-head self-attention module.
arXiv Detail & Related papers (2021-08-26T15:02:55Z) - Cancer Gene Profiling through Unsupervised Discovery [49.28556294619424]
We introduce a novel, automatic and unsupervised framework to discover low-dimensional gene biomarkers.
Our method is based on the LP-Stability algorithm, a high dimensional center-based unsupervised clustering algorithm.
Our signature reports promising results on distinguishing immune inflammatory and immune desert tumors.
arXiv Detail & Related papers (2021-02-11T09:04:45Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Topological Data Analysis of copy number alterations in cancer [70.85487611525896]
We explore the potential to capture information contained in cancer genomic information using a novel topology-based approach.
We find that this technique has the potential to extract meaningful low-dimensional representations in cancer somatic genetic data.
arXiv Detail & Related papers (2020-11-22T17:31:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.