Biomarker Gene Identification for Breast Cancer Classification
- URL: http://arxiv.org/abs/2111.05546v1
- Date: Wed, 10 Nov 2021 06:38:50 GMT
- Title: Biomarker Gene Identification for Breast Cancer Classification
- Authors: Sheetal Rajpal, Ankit Rajpal, Manoj Agarwal, Naveen Kumar
- Abstract summary: The present work uses interpretable predictions made by the deep neural network employed for subtype classification to identify biomarkers.
The proposed algorithm led to the discovery of a set of 43 differentially expressed gene signatures.
- Score: 2.403531305046943
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: BACKGROUND: Breast cancer has emerged as one of the most prevalent cancers
among women leading to a high mortality rate. Due to the heterogeneous nature
of breast cancer, there is a need to identify differentially expressed genes
associated with breast cancer subtypes for its timely diagnosis and treatment.
OBJECTIVE: To identify a small gene set for each of the four breast cancer
subtypes that could act as its signature, the paper proposes a novel algorithm
for gene signature identification. METHODS: The present work uses interpretable
AI methods to investigate the predictions made by the deep neural network
employed for subtype classification to identify biomarkers using the TCGA
breast cancer RNA Sequence data. RESULTS: The proposed algorithm led to the
discovery of a set of 43 differentially expressed gene signatures. We achieved
a competitive average 10-fold accuracy of 0.91, using neural network
classifier. Further, gene set analysis revealed several relevant pathways, such
as GRB7 events in ERBB2 and p53 signaling pathway. Using the Pearson
correlation matrix, we noted that the subtype-specific genes are correlated
within each subtype. CONCLUSIONS: The proposed technique enables us to find a
concise and clinically relevant gene signature set.
Related papers
- Prompting Whole Slide Image Based Genetic Biomarker Prediction [13.764676578911526]
We propose a whole slide image (WSI) based genetic biomarker prediction method via prompting techniques.
We leverage large language models to generate medical prompts that serve as prior knowledge in extracting instances associated with genetic biomarkers.
We adopt a coarse-to-fine approach to mine biomarker information within the tumor microenvironment.
arXiv Detail & Related papers (2024-06-26T11:05:46Z) - Biomarker based Cancer Classification using an Ensemble with Pre-trained Models [2.2436844508175224]
We propose a novel ensemble model combining pre-trained Hyperfast model, XGBoost, and LightGBM for multi-class classification tasks.
We leverage a meta-trained Hyperfast model for classifying cancer, accomplishing the highest AUC of 0.9929.
We also propose a novel ensemble model combining pre-trained Hyperfast model, XGBoost, and LightGBM for multi-class classification tasks, achieving an incremental increase in accuracy (0.9464)
arXiv Detail & Related papers (2024-06-14T14:43:59Z) - Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances.
BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules.
BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z) - Machine Learning Methods for Cancer Classification Using Gene Expression
Data: A Review [77.34726150561087]
Cancer is the second major cause of death after cardiovascular diseases.
Gene expression can play a fundamental role in the early detection of cancer.
This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods.
arXiv Detail & Related papers (2023-01-28T15:03:03Z) - rfPhen2Gen: A machine learning based association study of brain imaging
phenotypes to genotypes [71.1144397510333]
We learned machine learning models to predict SNPs using 56 brain imaging QTs.
SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest.
Random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders.
arXiv Detail & Related papers (2022-03-31T20:15:22Z) - Deep Learning Based Model for Breast Cancer Subtype Classification [3.419451872918847]
This paper focuses on the use of gene expression data for the classification of breast cancer into four subtypes, Basal, Her2, LumA, and LumB.
The size of the feature set is reduced from 20,530 gene expression values to 500 by using an autoencoder.
By deploying the combined network of stages 1 and 2, we have been able to attain a mean 10-fold test accuracy of 0.907 on the TCGA breast cancer dataset.
arXiv Detail & Related papers (2021-11-06T17:15:35Z) - DeepGene Transformer: Transformer for the gene expression-based classification of cancer subtypes [5.179504118679301]
Cancer and its subtypes constitute approximately 30% of all causes of death globally.
DeepGene Transformer is proposed which addresses the complexity of high-dimensional gene expression with a multi-head self-attention module.
arXiv Detail & Related papers (2021-08-26T15:02:55Z) - Cancer Gene Profiling through Unsupervised Discovery [49.28556294619424]
We introduce a novel, automatic and unsupervised framework to discover low-dimensional gene biomarkers.
Our method is based on the LP-Stability algorithm, a high dimensional center-based unsupervised clustering algorithm.
Our signature reports promising results on distinguishing immune inflammatory and immune desert tumors.
arXiv Detail & Related papers (2021-02-11T09:04:45Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Topological Data Analysis of copy number alterations in cancer [70.85487611525896]
We explore the potential to capture information contained in cancer genomic information using a novel topology-based approach.
We find that this technique has the potential to extract meaningful low-dimensional representations in cancer somatic genetic data.
arXiv Detail & Related papers (2020-11-22T17:31:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.