Machine Learning Methods for Cancer Classification Using Gene Expression
Data: A Review
- URL: http://arxiv.org/abs/2301.12222v1
- Date: Sat, 28 Jan 2023 15:03:03 GMT
- Title: Machine Learning Methods for Cancer Classification Using Gene Expression
Data: A Review
- Authors: Fadi Alharbi and Aleksandar Vakanski
- Abstract summary: Cancer is the second major cause of death after cardiovascular diseases.
Gene expression can play a fundamental role in the early detection of cancer.
This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods.
- Score: 77.34726150561087
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cancer is a term that denotes a group of diseases caused by abnormal growth
of cells that can spread in different parts of the body. According to the World
Health Organization (WHO), cancer is the second major cause of death after
cardiovascular diseases. Gene expression can play a fundamental role in the
early detection of cancer, as it is indicative of the biochemical processes in
tissue and cells, as well as the genetic characteristics of an organism.
Deoxyribonucleic Acid (DNA) microarrays and Ribonucleic Acid (RNA)- sequencing
methods for gene expression data allow quantifying the expression levels of
genes and produce valuable data for computational analysis. This study reviews
recent progress in gene expression analysis for cancer classification using
machine learning methods. Both conventional and deep learning-based approaches
are reviewed, with an emphasis on the ap-plication of deep learning models due
to their comparative advantages for identifying gene patterns that are
distinctive for various types of cancers. Relevant works that employ the most
commonly used deep neural network architectures are covered, including
multi-layer perceptrons, convolutional, recurrent, graph, and transformer
networks. This survey also presents an overview of the data collection methods
for gene expression analysis and lists important datasets that are commonly
used for supervised machine learning for this task. Furthermore, reviewed are
pertinent techniques for feature engineering and data preprocessing that are
typically used to handle the high dimensionality of gene expression data,
caused by a large number of genes present in data samples. The paper concludes
with a discussion of future research directions for machine learning-based gene
expression analysis for cancer classification.
Related papers
- Pan-cancer gene set discovery via scRNA-seq for optimal deep learning based downstream tasks [6.869831177092736]
We analyzed scRNA-seq data from 181 tumor biopsies across 13 cancer types.
High-dimensional weighted gene co-expression network analysis (hdWGCNA) was performed to identify relevant gene sets.
Oncogenes from OncoKB evaluated with deep learning models, including multilayer perceptrons (MLPs) and graph neural networks (GNNs)
arXiv Detail & Related papers (2024-08-13T23:24:36Z) - VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning.
By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z) - A Comparative Analysis of Gene Expression Profiling by Statistical and
Machine Learning Approaches [1.8954222800767324]
We discuss the biological and the methodological limitations of machine learning models to classify cancer samples.
Gene rankings are obtained from explainability methods adapted to these models.
We observe that the information learned by black-box neural networks is related to the notion of differential expression.
arXiv Detail & Related papers (2024-02-01T18:17:36Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - An end-to-end framework for gene expression classification by
integrating a background knowledge graph: application to cancer prognosis
prediction [1.5484595752241122]
We proposed an end-to-end framework to handle secondary data to construct a classification model for primary data.
We applied this framework to cancer prognosis prediction using gene expression data and a biological network.
arXiv Detail & Related papers (2023-06-29T11:20:47Z) - Cancer Gene Profiling through Unsupervised Discovery [49.28556294619424]
We introduce a novel, automatic and unsupervised framework to discover low-dimensional gene biomarkers.
Our method is based on the LP-Stability algorithm, a high dimensional center-based unsupervised clustering algorithm.
Our signature reports promising results on distinguishing immune inflammatory and immune desert tumors.
arXiv Detail & Related papers (2021-02-11T09:04:45Z) - SimpleChrome: Encoding of Combinatorial Effects for Predicting Gene
Expression [8.326669256957352]
We present SimpleChrome, a deep learning model that learns the histone modification representations of genes.
The features learned from the model allow us to better understand the latent effects of cross-gene interactions and direct gene regulation on the target gene expression.
arXiv Detail & Related papers (2020-12-15T23:30:36Z) - Topological Data Analysis of copy number alterations in cancer [70.85487611525896]
We explore the potential to capture information contained in cancer genomic information using a novel topology-based approach.
We find that this technique has the potential to extract meaningful low-dimensional representations in cancer somatic genetic data.
arXiv Detail & Related papers (2020-11-22T17:31:23Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Low-Rank Reorganization via Proportional Hazards Non-negative Matrix
Factorization Unveils Survival Associated Gene Clusters [9.773075235189525]
In this work, Cox proportional hazards regression is integrated with NMF by imposing survival constraints.
Using human cancer gene expression data, the proposed technique can unravel critical clusters of cancer genes.
The discovered gene clusters reflect rich biological implications and can help identify survival-related biomarkers.
arXiv Detail & Related papers (2020-08-09T17:59:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.