Hybrid gene selection approach using XGBoost and multi-objective genetic
algorithm for cancer classification
- URL: http://arxiv.org/abs/2106.05841v1
- Date: Sun, 30 May 2021 03:43:22 GMT
- Title: Hybrid gene selection approach using XGBoost and multi-objective genetic
algorithm for cancer classification
- Authors: Xiongshi Deng, Min Li, Shaobo Deng, Lei Wang
- Abstract summary: We propose a two-stage gene selection approach by combining extreme gradient boosting (XGBoost) and a multi-objective optimization genetic algorithm (XGBoost-MOGA) for cancer classification in microarray datasets.
XGBoost-MOGA yields significantly better results than previous state-of-the-art algorithms in terms of various evaluation criteria, such as accuracy, F-score, precision, and recall.
- Score: 6.781877756322586
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Microarray gene expression data are often accompanied by a large number of
genes and a small number of samples. However, only a few of these genes are
relevant to cancer, resulting in signigicant gene selection challenges. Hence,
we propose a two-stage gene selection approach by combining extreme gradient
boosting (XGBoost) and a multi-objective optimization genetic algorithm
(XGBoost-MOGA) for cancer classification in microarray datasets. In the first
stage, the genes are ranked use an ensemble-based feature selection using
XGBoost. This stage can effectively remove irrelevant genes and yield a group
comprising the most relevant genes related to the class. In the second stage,
XGBoost-MOGA searches for an optimal gene subset based on the most relevant
genes's group using a multi-objective optimization genetic algorithm. We
performed comprehensive experiments to compare XGBoost-MOGA with other
state-of-the-art feature selection methods using two well-known learning
classifiers on 13 publicly available microarray expression datasets. The
experimental results show that XGBoost-MOGA yields significantly better results
than previous state-of-the-art algorithms in terms of various evaluation
criteria, such as accuracy, F-score, precision, and recall.
Related papers
- Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances.
BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules.
BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z) - VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning.
By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z) - Exhaustive Exploitation of Nature-inspired Computation for Cancer Screening in an Ensemble Manner [20.07173196364489]
This study presents a framework termed Evolutionary Optimized Diverse Ensemble Learning (EODE) to improve ensemble learning for cancer classification from gene expression data.
Experiments were conducted across 35 gene expression benchmark datasets encompassing varied cancer types.
arXiv Detail & Related papers (2024-04-06T08:07:48Z) - Feature Selection via Robust Weighted Score for High Dimensional Binary
Class-Imbalanced Gene Expression Data [1.2891210250935148]
A robust weighted score for unbalanced data (ROWSU) is proposed for selecting the most discriminative feature for high dimensional gene expression binary classification with class-imbalance problem.
The performance of the proposed ROWSU method is evaluated on $6$ gene expression datasets.
arXiv Detail & Related papers (2024-01-23T11:22:03Z) - Genetic heterogeneity analysis using genetic algorithm and network
science [2.6166087473624318]
Genome-wide association studies (GWAS) can identify disease susceptible genetic variables.
Genetic variables intertwined with genetic effects often exhibit lower effect-size.
This paper introduces a novel feature selection mechanism for GWAS, named Feature Co-selection Network (FCSNet)
arXiv Detail & Related papers (2023-08-12T01:28:26Z) - A Novel Fuzzy Bi-Clustering Algorithm with AFS for Identification of
Co-Regulated Genes [0.799536002595393]
This paper proposes a novel fuzzy bi-clustering algorithm for identification of co-regulated genes.
The proposed algorithm can effectively detect the co-regulated genes without any prior knowledge of the gene expression data.
arXiv Detail & Related papers (2023-02-03T08:35:49Z) - Machine Learning Methods for Cancer Classification Using Gene Expression
Data: A Review [77.34726150561087]
Cancer is the second major cause of death after cardiovascular diseases.
Gene expression can play a fundamental role in the early detection of cancer.
This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods.
arXiv Detail & Related papers (2023-01-28T15:03:03Z) - Natural language processing for clusterization of genes according to
their functions [62.997667081978825]
We propose an approach that reduces the analysis of several thousand genes to analysis of several clusters.
The descriptions are encoded as vectors using the pretrained language model (BERT) and some text processing approaches.
arXiv Detail & Related papers (2022-07-17T12:59:34Z) - Cancer Gene Profiling through Unsupervised Discovery [49.28556294619424]
We introduce a novel, automatic and unsupervised framework to discover low-dimensional gene biomarkers.
Our method is based on the LP-Stability algorithm, a high dimensional center-based unsupervised clustering algorithm.
Our signature reports promising results on distinguishing immune inflammatory and immune desert tumors.
arXiv Detail & Related papers (2021-02-11T09:04:45Z) - A Novel Granular-Based Bi-Clustering Method of Deep Mining the
Co-Expressed Genes [76.84066556597342]
Bi-clustering methods are used to mine bi-clusters whose subsets of samples (genes) are co-regulated under their test conditions.
Unfortunately, traditional bi-clustering methods are not fully effective in discovering such bi-clusters.
We propose a novel bi-clustering method by involving here the theory of Granular Computing.
arXiv Detail & Related papers (2020-05-12T02:04:40Z) - A New Gene Selection Algorithm using Fuzzy-Rough Set Theory for Tumor
Classification [0.0]
We present a new technique for gene selection using a discernibility matrix of fuzzy-rough sets.
The proposed technique takes into account the similarity of those instances that have the same and different class labels to improve the gene selection results.
Experimental results demonstrate that this technique provides better efficiency compared to the state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-26T13:43:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.