BoMGene: Integrating Boruta-mRMR feature selection for enhanced Gene expression classification
- URL: http://arxiv.org/abs/2510.00907v1
- Date: Wed, 01 Oct 2025 13:47:08 GMT
- Title: BoMGene: Integrating Boruta-mRMR feature selection for enhanced Gene expression classification
- Authors: Bich-Chung Phan, Thanh Ma, Huu-Hoa Nguyen, Thanh-Nghi Do,
- Abstract summary: BoMGene is a hybrid feature selection method that integrates Boruta and Minimum Redundancy Maximum Relevance (mRMR)<n>The proposed approach demonstrates clear advantages in accuracy, stability, and practical applicability for multi-class gene expression data analysis.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Feature selection is a crucial step in analyzing gene expression data, enhancing classification performance, and reducing computational costs for high-dimensional datasets. This paper proposes BoMGene, a hybrid feature selection method that effectively integrates two popular techniques: Boruta and Minimum Redundancy Maximum Relevance (mRMR). The method aims to optimize the feature space and enhance classification accuracy. Experiments were conducted on 25 publicly available gene expression datasets, employing widely used classifiers such as Support Vector Machine (SVM), Random Forest, XGBoost (XGB), and Gradient Boosting Machine (GBM). The results show that using the Boruta-mRMR combination cuts down the number of features chosen compared to just using mRMR, which helps to speed up training time while keeping or even improving classification accuracy compared to using individual feature selection methods. The proposed approach demonstrates clear advantages in accuracy, stability, and practical applicability for multi-class gene expression data analysis
Related papers
- HeFS: Helper-Enhanced Feature Selection via Pareto-Optimized Genetic Search [10.751560953850925]
We introduce the HeFS (Helper-Enhanced Feature Selection) framework to refine feature subsets produced by existing algorithms.<n>HeFS systematically searches the residual feature space to identify a Helper Set - features that complement the original subset and improve classification performance.<n> Experiments on 18 benchmark datasets demonstrate that HeFS consistently identifies yet informative features and achieves superior performance over state-of-the-art methods.
arXiv Detail & Related papers (2025-10-21T12:30:22Z) - TayFCS: Towards Light Feature Combination Selection for Deep Recommender Systems [44.80081613834248]
Taylor Expansion Scorer (TayScorer) module for field-wise Taylor expansion on the base model.<n> Logistic Regression Elimination (LRE) estimates the corresponding information gain based on the model prediction performance.
arXiv Detail & Related papers (2025-07-05T04:22:42Z) - BOLIMES: Boruta and LIME optiMized fEature Selection for Gene Expression Classification [0.0937465283958018]
BOLIMES is a novel feature selection algorithm designed to enhance gene expression classification.<n>It combines exhaustive feature selection with interpretability-driven refinement, offering a powerful solution for high-dimensional gene expression analysis.
arXiv Detail & Related papers (2025-02-18T17:33:41Z) - Permutation-based multi-objective evolutionary feature selection for high-dimensional data [43.18726655647964]
We propose a novel feature selection method for high-dimensional data, based on the well-known permutation feature importance approach.<n>The proposed method employs a multi-objective evolutionary algorithm to search for candidate feature subsets.<n>The effectiveness of our method has been validated on a set of 24 publicly available high-dimensional datasets.
arXiv Detail & Related papers (2025-01-24T08:11:28Z) - Distance-based mutual congestion feature selection with genetic algorithm for high-dimensional medical datasets [2.6037922505725675]
There isn't a universally optimal feature selection method applicable to any data distribution.
This paper introduces the Distance-based Mutual Congestion (DMC) as a filter method that considers both the feature values and the distribution of observations in the response variable.
The hybrid DMC-GAwAR is applicable to binary classification datasets, and experimental results demonstrate its superiority over some recent works.
arXiv Detail & Related papers (2024-07-22T13:08:50Z) - Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses.
Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z) - Embedded Multi-label Feature Selection via Orthogonal Regression [45.55795914923279]
State-of-the-art embedded multi-label feature selection algorithms based on at least square regression cannot preserve sufficient discriminative information in multi-label data.
A novel embedded multi-label feature selection method is proposed to facilitate the multi-label feature selection.
Extensive experimental results on ten multi-label data sets demonstrate the effectiveness of GRROOR.
arXiv Detail & Related papers (2024-03-01T06:18:40Z) - Subspace Learning for Feature Selection via Rank Revealing QR
Factorization: Unsupervised and Hybrid Approaches with Non-negative Matrix
Factorization and Evolutionary Algorithm [0.0]
rank revealing QR (RRQR) factorization is leveraged in obtaining the most informative features as a novel unsupervised feature selection technique.
A hybrid feature selection algorithm is proposed by coupling RRQR, as a filter-based technique, and a Genetic algorithm as a wrapper-based technique.
The proposed algorithm shows to be dependable and robust when compared against state-of-the-art feature selection algorithms in supervised, unsupervised, and semi-supervised settings.
arXiv Detail & Related papers (2022-10-02T04:04:47Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Optimally Combining Classifiers for Semi-Supervised Learning [43.77365242185884]
We propose a new semi-supervised learning method that is able to adaptively combine the strengths of Xgboost and transductive support vector machine.
The experimental results on the UCI data sets and real commercial data set demonstrate the superior classification performance of our method over the five state-of-the-art algorithms.
arXiv Detail & Related papers (2020-06-07T09:28:34Z) - A New Gene Selection Algorithm using Fuzzy-Rough Set Theory for Tumor
Classification [0.0]
We present a new technique for gene selection using a discernibility matrix of fuzzy-rough sets.
The proposed technique takes into account the similarity of those instances that have the same and different class labels to improve the gene selection results.
Experimental results demonstrate that this technique provides better efficiency compared to the state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-26T13:43:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.