A model-free feature selection technique of feature screening and random
forest based recursive feature elimination
- URL: http://arxiv.org/abs/2302.07449v1
- Date: Wed, 15 Feb 2023 03:39:16 GMT
- Title: A model-free feature selection technique of feature screening and random
forest based recursive feature elimination
- Authors: Siwei Xia and Yuehan Yang
- Abstract summary: We propose a model-free feature selection method for ultra-high dimensional data with mass features.
We show that the proposed method is selection consistent and $L$ consistent under weak regularity conditions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a model-free feature selection method for
ultra-high dimensional data with mass features. This is a two phases procedure
that we propose to use the fused Kolmogorov filter with the random forest based
RFE to remove model limitations and reduce the computational complexity. The
method is fully nonparametric and can work with various types of datasets. It
has several appealing characteristics, i.e., accuracy, model-free, and
computational efficiency, and can be widely used in practical problems, such as
multiclass classification, nonparametric regression, and Poisson regression,
among others. We show that the proposed method is selection consistent and
$L_2$ consistent under weak regularity conditions. We further demonstrate the
superior performance of the proposed method over other existing methods by
simulations and real data examples.
Related papers
- Total Uncertainty Quantification in Inverse PDE Solutions Obtained with Reduced-Order Deep Learning Surrogate Models [50.90868087591973]
We propose an approximate Bayesian method for quantifying the total uncertainty in inverse PDE solutions obtained with machine learning surrogate models.
We test the proposed framework by comparing it with the iterative ensemble smoother and deep ensembling methods for a non-linear diffusion equation.
arXiv Detail & Related papers (2024-08-20T19:06:02Z) - Multi-Response Heteroscedastic Gaussian Process Models and Their
Inference [1.52292571922932]
We propose a novel framework for the modeling of heteroscedastic covariance functions.
We employ variational inference to approximate the posterior and facilitate posterior predictive modeling.
We show that our proposed framework offers a robust and versatile tool for a wide array of applications.
arXiv Detail & Related papers (2023-08-29T15:06:47Z) - Random Manifold Sampling and Joint Sparse Regularization for Multi-label
Feature Selection [0.0]
The model proposed in this paper can obtain the most relevant few features by solving the joint constrained optimization problems of $ell_2,1$ and $ell_F$ regularization.
Comparative experiments on real-world data sets show that the proposed method outperforms other methods.
arXiv Detail & Related papers (2022-04-13T15:06:12Z) - Flexible variable selection in the presence of missing data [0.0]
We propose a non-parametric variable selection algorithm combined with multiple imputation to develop flexible panels in the presence of missing-at-random data.
We show that our proposal has good operating characteristics and results in panels with higher classification and variable selection performance.
arXiv Detail & Related papers (2022-02-25T21:41:03Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - On the Adversarial Robustness of LASSO Based Feature Selection [72.54211869067979]
In the considered model, there is a malicious adversary who can observe the whole dataset, and then will carefully modify the response values or the feature matrix.
We formulate the modification strategy of the adversary as a bi-level optimization problem.
Numerical examples with synthetic and real data illustrate that our method is efficient and effective.
arXiv Detail & Related papers (2020-10-20T05:51:26Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z) - Robust Multi-class Feature Selection via $l_{2,0}$-Norm Regularization
Minimization [6.41804410246642]
Feature selection is an important computational-processing in data mining and machine learning.
In this paper, a novel method based on homoy hard threshold (HIHT) is proposed to solve the least square problem for multi-class feature selection.
arXiv Detail & Related papers (2020-10-08T02:06:06Z) - A Bayesian Approach with Type-2 Student-tMembership Function for T-S
Model Identification [47.25472624305589]
fuzzyc-regression clustering based on type-2 fuzzyset has been shown the remarkable results on non-sparse data.
Aninnovative architecture for fuzzyc-regression model is presented and a novel student-tdistribution based membership functionis designed for sparse data modelling.
arXiv Detail & Related papers (2020-09-02T05:10:13Z) - Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables.
The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.