Ensemble- and Distance-Based Feature Ranking for Unsupervised Learning
- URL: http://arxiv.org/abs/2011.11679v1
- Date: Mon, 23 Nov 2020 19:17:24 GMT
- Title: Ensemble- and Distance-Based Feature Ranking for Unsupervised Learning
- Authors: Matej Petkovi\'c, Dragi Kocev, Bla\v{z} \v{S}krlj, Sa\v{s}o
D\v{z}eroski
- Abstract summary: We propose two novel (groups of) methods for unsupervised feature ranking and selection.
The first group includes feature ranking scores (Genie3 score, RandomForest score) that are computed from ensembles of predictive clustering trees.
The second method is URelief, the unsupervised extension of the Relief family of feature ranking algorithms.
- Score: 2.7921429800866533
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this work, we propose two novel (groups of) methods for unsupervised
feature ranking and selection. The first group includes feature ranking scores
(Genie3 score, RandomForest score) that are computed from ensembles of
predictive clustering trees. The second method is URelief, the unsupervised
extension of the Relief family of feature ranking algorithms. Using 26
benchmark data sets and 5 baselines, we show that both the Genie3 score
(computed from the ensemble of extra trees) and the URelief method outperform
the existing methods and that Genie3 performs best overall, in terms of
predictive power of the top-ranked features. Additionally, we analyze the
influence of the hyper-parameters of the proposed methods on their performance,
and show that for the Genie3 score the highest quality is achieved by the most
efficient parameter configuration. Finally, we propose a way of discovering the
location of the features in the ranking, which are the most relevant in
reality.
Related papers
- On the Evaluation Consistency of Attribution-based Explanations [42.1421504321572]
We introduce Meta-Rank, an open platform for benchmarking attribution methods in the image domain.
Our benchmark reveals three insights in attribution evaluation endeavors: 1) evaluating attribution methods under disparate settings can yield divergent performance rankings; 2) although inconsistent across numerous cases, the performance rankings exhibit remarkable consistency across distinct checkpoints along the same training trajectory; and 3) prior attempts at consistent evaluation fare no better than baselines when extended to more heterogeneous models and datasets.
arXiv Detail & Related papers (2024-07-28T11:49:06Z) - RankSHAP: Shapley Value Based Feature Attributions for Learning to Rank [28.438428292619577]
We adopt an axiomatic game-theoretic approach, popular in the feature attribution community, to identify a set of fundamental axioms that every ranking-based feature attribution method should satisfy.
We then introduce Rank-SHAP, extending classical Shapley values to ranking.
We also perform an axiomatic analysis of existing rank attribution algorithms to determine their compliance with our proposed axioms.
arXiv Detail & Related papers (2024-05-03T04:43:24Z) - Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses.
Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z) - Bipartite Ranking Fairness through a Model Agnostic Ordering Adjustment [54.179859639868646]
We propose a model agnostic post-processing framework xOrder for achieving fairness in bipartite ranking.
xOrder is compatible with various classification models and ranking fairness metrics, including supervised and unsupervised fairness metrics.
We evaluate our proposed algorithm on four benchmark data sets and two real-world patient electronic health record repositories.
arXiv Detail & Related papers (2023-07-27T07:42:44Z) - An Evolutionary Correlation-aware Feature Selection Method for
Classification Problems [3.2550305883611244]
In this paper, an estimation of distribution algorithm is proposed to meet three goals.
Firstly, as an extension of EDA, the proposed method generates only two individuals in each iteration that compete based on a fitness function.
Secondly, we provide a guiding technique for determining the number of features for individuals in each iteration.
As the main contribution of the paper, in addition to considering the importance of each feature alone, the proposed method can consider the interaction between features.
arXiv Detail & Related papers (2021-10-16T20:20:43Z) - Hierarchical Ranking for Answer Selection [19.379777219863964]
We propose a novel strategy for answer selection, called hierarchical ranking.
We introduce three levels of ranking: point-level ranking, pair-level ranking, and list-level ranking.
Experimental results on two public datasets, WikiQA and TREC-QA, demonstrate that the proposed hierarchical ranking is effective.
arXiv Detail & Related papers (2021-02-01T07:35:52Z) - Feature Importance Ranking for Deep Learning [7.287652818214449]
We propose a novel dual-net architecture consisting of operator and selector for discovery of an optimal feature subset of a fixed size.
During learning, the operator is trained for a supervised learning task via optimal feature subset candidates generated by the selector.
In deployment, the selector generates an optimal feature subset and ranks feature importance, while the operator makes predictions based on the optimal subset for test data.
arXiv Detail & Related papers (2020-10-18T12:20:27Z) - Exploration in two-stage recommender systems [79.50534282841618]
Two-stage recommender systems are widely adopted in industry due to their scalability and maintainability.
A key challenge of this setup is that optimal performance of each stage in isolation does not imply optimal global performance.
We propose a method of synchronising the exploration strategies between the ranker and the nominators.
arXiv Detail & Related papers (2020-09-01T16:52:51Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - SetRank: A Setwise Bayesian Approach for Collaborative Ranking from
Implicit Feedback [50.13745601531148]
We propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to accommodate the characteristics of implicit feedback in recommender system.
Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons.
We also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to $sqrtM/N$.
arXiv Detail & Related papers (2020-02-23T06:40:48Z) - PointHop++: A Lightweight Learning Model on Point Sets for 3D
Classification [55.887502438160304]
The PointHop method was recently proposed by Zhang et al. for 3D point cloud classification with unsupervised feature extraction.
We improve the PointHop method furthermore in two aspects: 1) reducing its model complexity in terms of the model parameter number and 2) ordering discriminant features automatically based on the cross-entropy criterion.
With experiments conducted on the ModelNet40 benchmark dataset, we show that the PointHop++ method performs on par with deep neural network (DNN) solutions and surpasses other unsupervised feature extraction methods.
arXiv Detail & Related papers (2020-02-09T04:49:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.