A Hybrid Two-layer Feature Selection Method Using GeneticAlgorithm and
Elastic Net
- URL: http://arxiv.org/abs/2001.11177v1
- Date: Thu, 30 Jan 2020 05:01:30 GMT
- Title: A Hybrid Two-layer Feature Selection Method Using GeneticAlgorithm and
Elastic Net
- Authors: Fatemeh Amini and Guiping Hu
- Abstract summary: This paper presents a new hybrid two-layer feature selection approach that combines a wrapper and an embedded method.
The Genetic Algorithm(GA) has been adopted as a wrapper to search for the optimal subset of predictors.
A second layer is added to the proposed method to eliminate any remaining redundant/irrelevant predictors to improve the prediction accuracy.
- Score: 6.85316573653194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature selection, as a critical pre-processing step for machine learning,
aims at determining representative predictors from a high-dimensional feature
space dataset to improve the prediction accuracy. However, the increase in
feature space dimensionality, comparing to the number of observations, poses a
severe challenge to many existing feature selection methods with respect to
computational efficiency and prediction performance. This paper presents a new
hybrid two-layer feature selection approach that combines a wrapper and an
embedded method in constructing an appropriate subset of predictors. In the
first layer of the proposed method, the Genetic Algorithm(GA) has been adopted
as a wrapper to search for the optimal subset of predictors, which aims to
reduce the number of predictors and the prediction error. As one of the
meta-heuristic approaches, GA is selected due to its computational efficiency;
however, GAs do not guarantee the optimality. To address this issue, a second
layer is added to the proposed method to eliminate any remaining
redundant/irrelevant predictors to improve the prediction accuracy. Elastic
Net(EN) has been selected as the embedded method in the second layer because of
its flexibility in adjusting the penalty terms in regularization process and
time efficiency. This hybrid two-layer approach has been applied on a Maize
genetic dataset from NAM population, which consists of multiple subsets of
datasets with different ratio of the number of predictors to the number of
observations. The numerical results confirm the superiority of the proposed
model.
Related papers
- Exploiting Diffusion Prior for Generalizable Dense Prediction [85.4563592053464]
Recent advanced Text-to-Image (T2I) diffusion models are sometimes too imaginative for existing off-the-shelf dense predictors to estimate.
We introduce DMP, a pipeline utilizing pre-trained T2I models as a prior for dense prediction tasks.
Despite limited-domain training data, the approach yields faithful estimations for arbitrary images, surpassing existing state-of-the-art algorithms.
arXiv Detail & Related papers (2023-11-30T18:59:44Z) - Subject-specific Deep Neural Networks for Count Data with
High-cardinality Categorical Features [1.2289361708127877]
We propose a novel hierarchical likelihood learning framework for introducing gamma random effects into a Poisson deep neural network.
The proposed method simultaneously yields maximum likelihood estimators for fixed parameters and best unbiased predictors for random effects.
State-of-the-art network architectures can be easily implemented into the proposed h-likelihood framework.
arXiv Detail & Related papers (2023-10-18T01:54:48Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models.
We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling.
We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Efficient first-order predictor-corrector multiple objective
optimization for fair misinformation detection [5.139559672771439]
Multiple-objective optimization (MOO) aims to simultaneously optimize multiple conflicting objectives and has found important applications in machine learning.
We propose a Gauss-Newton approximation that only scales linearly, and that requires only first-order inner-product per iteration.
The innovations make predictor-corrector possible for large networks.
arXiv Detail & Related papers (2022-09-15T12:32:15Z) - Consensual Aggregation on Random Projected High-dimensional Features for
Regression [0.0]
We present a study of a kernel-based consensual aggregation on randomly projected high-dimensional features of predictions for regression.
We numerically illustrate that the aggregation scheme upholds its performance on very large and highly correlated features.
The efficiency of the proposed method is illustrated through several experiments evaluated on different types of synthetic and real datasets.
arXiv Detail & Related papers (2022-04-06T06:35:47Z) - Accelerating Stochastic Probabilistic Inference [1.599072005190786]
Variational Inference (SVI) has been increasingly attractive thanks to its ability to find good posterior approximations of probabilistic models.
Almost all the state-of-the-art SVI algorithms are based on first-order optimization and often suffer from poor convergence rate.
We bridge the gap between second-order methods and variational inference by proposing a second-order based variational inference approach.
arXiv Detail & Related papers (2022-03-15T01:19:12Z) - Efficient and Differentiable Conformal Prediction with General Function
Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters.
We show that it achieves approximate valid population coverage and near-optimal efficiency within class.
Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z) - Efficient Ensemble Model Generation for Uncertainty Estimation with
Bayesian Approximation in Segmentation [74.06904875527556]
We propose a generic and efficient segmentation framework to construct ensemble segmentation models.
In the proposed method, ensemble models can be efficiently generated by using the layer selection method.
We also devise a new pixel-wise uncertainty loss, which improves the predictive performance.
arXiv Detail & Related papers (2020-05-21T16:08:38Z) - Gaussian Process Boosting [13.162429430481982]
We introduce a novel way to combine boosting with Gaussian process and mixed effects models.
We obtain increased prediction accuracy compared to existing approaches on simulated and real-world data sets.
arXiv Detail & Related papers (2020-04-06T13:19:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.