On the utility of feature selection in building two-tier decision trees
- URL: http://arxiv.org/abs/2212.14448v1
- Date: Thu, 29 Dec 2022 20:10:45 GMT
- Title: On the utility of feature selection in building two-tier decision trees
- Authors: Sergey A. Saltykov
- Abstract summary: It is demonstrated that the synergistic effect of complementary features mutually amplifying each other in the construction of two-tier decision trees can be interfered with by another feature.
Removing or eliminating the interfering feature can improve performance by up to 24 times.
It is concluded that this broadens the scope of feature selection methods for cases where data and computational resources are sufficient.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Nowadays, feature selection is frequently used in machine learning when there
is a risk of performance degradation due to overfitting or when computational
resources are limited. During the feature selection process, the subset of
features that are most relevant and least redundant is chosen. In recent years,
it has become clear that, in addition to relevance and redundancy, features'
complementarity must be considered. Informally, if the features are weak
predictors of the target variable separately and strong predictors when
combined, then they are complementary. It is demonstrated in this paper that
the synergistic effect of complementary features mutually amplifying each other
in the construction of two-tier decision trees can be interfered with by
another feature, resulting in a decrease in performance. It is demonstrated
using cross-validation on both synthetic and real datasets, regression and
classification, that removing or eliminating the interfering feature can
improve performance by up to 24 times. It has also been discovered that the
lesser the domain is learned, the greater the increase in performance. More
formally, it is demonstrated that there is a statistically significant negative
rank correlation between performance on the dataset prior to the elimination of
the interfering feature and performance growth after the elimination of the
interfering feature. It is concluded that this broadens the scope of feature
selection methods for cases where data and computational resources are
sufficient.
Related papers
- Fairness-Aware Streaming Feature Selection with Causal Graphs [10.644488289941021]
Streaming Feature Selection with Causal Fairness builds causal graphs egocentric to prediction label and protected feature.
We benchmark SFCF on five datasets widely used in streaming feature research.
arXiv Detail & Related papers (2024-08-17T00:41:02Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Copula for Instance-wise Feature Selection and Ranking [24.09326839818306]
We propose to incorporate the Gaussian copula, a powerful mathematical technique for capturing correlations between variables, into the current feature selection framework.
Experimental results on both synthetic and real datasets, in terms of performance comparison and interpretability, demonstrate that our method is capable of capturing meaningful correlations.
arXiv Detail & Related papers (2023-08-01T13:45:04Z) - Improving Out-of-Distribution Generalization of Neural Rerankers with
Contextualized Late Interaction [52.63663547523033]
Late interaction, the simplest form of multi-vector, is also helpful to neural rerankers that only use the [] vector to compute the similarity score.
We show that the finding is consistent across different model sizes and first-stage retrievers of diverse natures.
arXiv Detail & Related papers (2023-02-13T18:42:17Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Deep Unsupervised Feature Selection by Discarding Nuisance and
Correlated Features [7.288137686773523]
Modern datasets contain large subsets of correlated features and nuisance features.
In the presence of large numbers of nuisance features, the Laplacian must be computed on the subset of selected features.
We employ an autoencoder architecture to cope with correlated features, trained to reconstruct the data from the subset of selected features.
arXiv Detail & Related papers (2021-10-11T14:26:13Z) - Out-of-distribution Generalization via Partial Feature Decorrelation [72.96261704851683]
We present a novel Partial Feature Decorrelation Learning (PFDL) algorithm, which jointly optimize a feature decomposition network and the target image classification model.
The experiments on real-world datasets demonstrate that our method can improve the backbone model's accuracy on OOD image classification datasets.
arXiv Detail & Related papers (2020-07-30T05:48:48Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - Multi-Objective Evolutionary approach for the Performance Improvement of
Learners using Ensembling Feature selection and Discretization Technique on
Medical data [8.121462458089143]
This paper proposes a novel multi-objective based dimensionality reduction framework.
It incorporates both discretization and feature reduction as an ensemble model for performing feature selection and discretization.
arXiv Detail & Related papers (2020-04-16T06:32:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.