A novel evaluation methodology for supervised Feature Ranking algorithms
- URL: http://arxiv.org/abs/2207.04258v1
- Date: Sat, 9 Jul 2022 12:00:36 GMT
- Title: A novel evaluation methodology for supervised Feature Ranking algorithms
- Authors: Jeroen G. S. Overschie
- Abstract summary: This paper proposes a new evaluation methodology for Feature Rankers.
By making use of synthetic datasets, feature importance scores can be known beforehand, allowing more systematic evaluation.
To facilitate large-scale experimentation using the new methodology, a benchmarking framework was built in Python, called fseval.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Both in the domains of Feature Selection and Interpretable AI, there exists a
desire to `rank' features based on their importance. Such feature importance
rankings can then be used to either: (1) reduce the dataset size or (2)
interpret the Machine Learning model. In the literature, however, such Feature
Rankers are not evaluated in a systematic, consistent way. Many papers have a
different way of arguing which feature importance ranker works best. This paper
fills this gap, by proposing a new evaluation methodology. By making use of
synthetic datasets, feature importance scores can be known beforehand, allowing
more systematic evaluation. To facilitate large-scale experimentation using the
new methodology, a benchmarking framework was built in Python, called fseval.
The framework allows running experiments in parallel and distributed over
machines on HPC systems. By integrating with an online platform called Weights
and Biases, charts can be interactively explored on a live dashboard. The
software was released as open-source software, and is published as a package on
the PyPi platform. The research concludes by exploring one such large-scale
experiment, to find the strengths and weaknesses of the participating
algorithms, on many fronts.
Related papers
- ShaRP: A Novel Feature Importance Framework for Ranking [6.753981445665063]
We present ShaRP - Shapley for Rankings and Preferences - a framework that explains the contributions of features to different aspects of a ranked outcome.
ShaRP builds on the Quantitative Input Influence framework to compute the contributions of features for multiple - ranking specific - Quantities of Interest.
We show the results of an extensive experimental validation of ShaRP using real and synthetic datasets.
arXiv Detail & Related papers (2024-01-30T04:48:43Z) - A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper.
Our dataset consists of 477 self-reported expertise scores provided by 58 researchers.
For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z) - piRank: A Probabilistic Intent Based Ranking Framework for Facebook
Search [0.07614628596146598]
We propose a probabilistic intent based ranking framework (short for piRank) to address various ranking issues for different query intents.
We conducted extensive experiments and studies on top of Facebook search engine system and validated the effectiveness of this new ranking architecture.
arXiv Detail & Related papers (2022-03-27T18:12:56Z) - What are the best systems? New perspectives on NLP Benchmarking [10.27421161397197]
We propose a new procedure to rank systems based on their performance across different tasks.
Motivated by the social choice theory, the final system ordering is obtained through aggregating the rankings induced by each task.
We show that our method yields different conclusions on state-of-the-art systems than the mean-aggregation procedure.
arXiv Detail & Related papers (2022-02-08T11:44:20Z) - Neural Code Summarization: How Far Are We? [30.324396716447602]
Deep learning techniques have been exploited to automatically generate summaries for given code snippets.
In this paper, we conduct a systematic and in-depth analysis of five state-of-the-art neural source code summarization models.
arXiv Detail & Related papers (2021-07-15T04:33:59Z) - Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep
Learning [66.59455427102152]
We introduce Uncertainty Baselines: high-quality implementations of standard and state-of-the-art deep learning methods on a variety of tasks.
Each baseline is a self-contained experiment pipeline with easily reusable and extendable components.
We provide model checkpoints, experiment outputs as Python notebooks, and leaderboards for comparing results.
arXiv Detail & Related papers (2021-06-07T23:57:32Z) - Hierarchical Bi-Directional Self-Attention Networks for Paper Review
Rating Recommendation [81.55533657694016]
We propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation.
Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three)
We are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers.
arXiv Detail & Related papers (2020-11-02T08:07:50Z) - Fewer is More: A Deep Graph Metric Learning Perspective Using Fewer
Proxies [65.92826041406802]
We propose a Proxy-based deep Graph Metric Learning approach from the perspective of graph classification.
Multiple global proxies are leveraged to collectively approximate the original data points for each class.
We design a novel reverse label propagation algorithm, by which the neighbor relationships are adjusted according to ground-truth labels.
arXiv Detail & Related papers (2020-10-26T14:52:42Z) - Scaling Systematic Literature Reviews with Machine Learning Pipelines [57.82662094602138]
Systematic reviews entail the extraction of data from scientific documents.
We construct a pipeline that automates each of these aspects, and experiment with many human-time vs. system quality trade-offs.
We find that we can get surprising accuracy and generalisability of the whole pipeline system with only 2 weeks of human-expert annotation.
arXiv Detail & Related papers (2020-10-09T16:19:42Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.