Dual-stage optimizer for systematic overestimation adjustment applied to
multi-objective genetic algorithms for biomarker selection
- URL: http://arxiv.org/abs/2312.16624v3
- Date: Thu, 29 Feb 2024 15:40:34 GMT
- Title: Dual-stage optimizer for systematic overestimation adjustment applied to
multi-objective genetic algorithms for biomarker selection
- Authors: Luca Cattelani and Vittorio Fortino
- Abstract summary: Biomarker identification with feature selection methods can be addressed as a multi-objective problem with trade-offs between predictive ability and parsimony in the number of features.
We propose DOSA-MO, a novel multi-objective optimization wrapper algorithm that learns how the original estimation, its variance, and the feature set size of the solutions predict the overestimation.
- Score: 0.18648070031379424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The challenge in biomarker discovery using machine learning from omics data
lies in the abundance of molecular features but scarcity of samples. Most
feature selection methods in machine learning require evaluating various sets
of features (models) to determine the most effective combination. This process,
typically conducted using a validation dataset, involves testing different
feature sets to optimize the model's performance. Evaluations have performance
estimation error and when the selection involves many models the best ones are
almost certainly overestimated. Biomarker identification with feature selection
methods can be addressed as a multi-objective problem with trade-offs between
predictive ability and parsimony in the number of features. Genetic algorithms
are a popular tool for multi-objective optimization but they evolve numerous
solutions thus are prone to overestimation. Methods have been proposed to
reduce the overestimation after a model has already been selected in
single-objective problems, but no algorithm existed capable of reducing the
overestimation during the optimization, improving model selection, or applied
in the more general multi-objective domain. We propose DOSA-MO, a novel
multi-objective optimization wrapper algorithm that learns how the original
estimation, its variance, and the feature set size of the solutions predict the
overestimation. DOSA-MO adjusts the expectation of the performance during the
optimization, improving the composition of the solution set. We verify that
DOSA-MO improves the performance of a state-of-the-art genetic algorithm on
left-out or external sample sets, when predicting cancer subtypes and/or
patient overall survival, using three transcriptomics datasets for kidney and
breast cancer.
Related papers
- Embedded feature selection in LSTM networks with multi-objective
evolutionary ensemble learning for time series forecasting [49.1574468325115]
We present a novel feature selection method embedded in Long Short-Term Memory networks.
Our approach optimize the weights and biases of the LSTM in a partitioned manner.
Experimental evaluations on air quality time series data from Italy and southeast Spain demonstrate that our method substantially improves the ability generalization of conventional LSTMs.
arXiv Detail & Related papers (2023-12-29T08:42:10Z) - An Empirical Evaluation of Zeroth-Order Optimization Methods on
AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives.
We show the advantages of ZO sign-based gradient descent (ZO-signGD)
We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z) - Multi-objective hyperparameter optimization with performance uncertainty [62.997667081978825]
This paper presents results on multi-objective hyperparameter optimization with uncertainty on the evaluation of Machine Learning algorithms.
We combine the sampling strategy of Tree-structured Parzen Estimators (TPE) with the metamodel obtained after training a Gaussian Process Regression (GPR) with heterogeneous noise.
Experimental results on three analytical test functions and three ML problems show the improvement over multi-objective TPE and GPR.
arXiv Detail & Related papers (2022-09-09T14:58:43Z) - Fair Feature Subset Selection using Multiobjective Genetic Algorithm [0.0]
We present a feature subset selection approach that improves both fairness and accuracy objectives.
We use statistical disparity as a fairness metric and F1-Score as a metric for model performance.
Our experiments on the most commonly used fairness benchmark datasets show that using the evolutionary algorithm we can effectively explore the trade-off between fairness and accuracy.
arXiv Detail & Related papers (2022-04-30T22:51:19Z) - The Importance of Landscape Features for Performance Prediction of
Modular CMA-ES Variants [2.3823600586675724]
Recent studies show that supervised machine learning methods can predict algorithm performance using landscape features extracted from the problem instances.
We consider the modular CMA-ES framework and estimate how much each landscape feature contributes to the best algorithm performance regression models.
arXiv Detail & Related papers (2022-04-15T11:55:28Z) - Meta-learning One-class Classifiers with Eigenvalue Solvers for
Supervised Anomaly Detection [55.888835686183995]
We propose a neural network-based meta-learning method for supervised anomaly detection.
We experimentally demonstrate that the proposed method achieves better performance than existing anomaly detection and few-shot learning methods.
arXiv Detail & Related papers (2021-03-01T01:43:04Z) - Robust Multi-class Feature Selection via $l_{2,0}$-Norm Regularization
Minimization [6.41804410246642]
Feature selection is an important computational-processing in data mining and machine learning.
In this paper, a novel method based on homoy hard threshold (HIHT) is proposed to solve the least square problem for multi-class feature selection.
arXiv Detail & Related papers (2020-10-08T02:06:06Z) - Landscape-Aware Fixed-Budget Performance Regression and Algorithm
Selection for Modular CMA-ES Variants [1.0965065178451106]
We show that it is possible to achieve high-quality performance predictions with off-the-shelf supervised learning approaches.
We test this approach on a portfolio of very similar algorithms, which we choose from the family of modular CMA-ES algorithms.
arXiv Detail & Related papers (2020-06-17T13:34:57Z) - Multi-Objective Evolutionary approach for the Performance Improvement of
Learners using Ensembling Feature selection and Discretization Technique on
Medical data [8.121462458089143]
This paper proposes a novel multi-objective based dimensionality reduction framework.
It incorporates both discretization and feature reduction as an ensemble model for performing feature selection and discretization.
arXiv Detail & Related papers (2020-04-16T06:32:15Z) - Discovering Representations for Black-box Optimization [73.59962178534361]
We show that black-box optimization encodings can be automatically learned, rather than hand designed.
We show that learned representations make it possible to solve high-dimensional problems with orders of magnitude fewer evaluations than the standard MAP-Elites.
arXiv Detail & Related papers (2020-03-09T20:06:20Z) - Stepwise Model Selection for Sequence Prediction via Deep Kernel
Learning [100.83444258562263]
We propose a novel Bayesian optimization (BO) algorithm to tackle the challenge of model selection in this setting.
In order to solve the resulting multiple black-box function optimization problem jointly and efficiently, we exploit potential correlations among black-box functions.
We are the first to formulate the problem of stepwise model selection (SMS) for sequence prediction, and to design and demonstrate an efficient joint-learning algorithm for this purpose.
arXiv Detail & Related papers (2020-01-12T09:42:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.