The impact of feature importance methods on the interpretation of defect
classifiers
- URL: http://arxiv.org/abs/2202.02389v1
- Date: Fri, 4 Feb 2022 21:00:59 GMT
- Title: The impact of feature importance methods on the interpretation of defect
classifiers
- Authors: Gopi Krishnan Rajbahadur, Shaowei Wang, Yasutaka Kamei, Ahmed E.
Hassan
- Abstract summary: We evaluate the agreement between the feature importance ranks associated with the studied classifiers through a case study of 18 software projects and six commonly used classifiers.
The computed feature importance ranks by the studied CA methods exhibit a strong agreement including the features reported at top-1 and top-3 ranks for a given dataset.
We demonstrate that removing these feature interactions, even with simple methods like CFS improves agreement between the computed feature importance ranks CA and CS methods.
- Score: 13.840006058766766
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Classifier specific (CS) and classifier agnostic (CA) feature importance
methods are widely used (often interchangeably) by prior studies to derive
feature importance ranks from a defect classifier. However, different feature
importance methods are likely to compute different feature importance ranks
even for the same dataset and classifier. Hence such interchangeable use of
feature importance methods can lead to conclusion instabilities unless there is
a strong agreement among different methods. Therefore, in this paper, we
evaluate the agreement between the feature importance ranks associated with the
studied classifiers through a case study of 18 software projects and six
commonly used classifiers. We find that: 1) The computed feature importance
ranks by CA and CS methods do not always strongly agree with each other. 2) The
computed feature importance ranks by the studied CA methods exhibit a strong
agreement including the features reported at top-1 and top-3 ranks for a given
dataset and classifier, while even the commonly used CS methods yield vastly
different feature importance ranks. Such findings raise concerns about the
stability of conclusions across replicated studies. We further observe that the
commonly used defect datasets are rife with feature interactions and these
feature interactions impact the computed feature importance ranks of the CS
methods (not the CA methods). We demonstrate that removing these feature
interactions, even with simple methods like CFS improves agreement between the
computed feature importance ranks of CA and CS methods. In light of our
findings, we provide guidelines for stakeholders and practitioners when
performing model interpretation and directions for future research, e.g.,
future research is needed to investigate the impact of advanced feature
interaction removal methods on computed feature importance ranks of different
CS methods.
Related papers
- Augmented Functional Random Forests: Classifier Construction and Unbiased Functional Principal Components Importance through Ad-Hoc Conditional Permutations [0.0]
This paper introduces a novel supervised classification strategy that integrates functional data analysis with tree-based methods.
We propose augmented versions of functional classification trees and functional random forests, incorporating a new tool for assessing the importance of functional principal components.
arXiv Detail & Related papers (2024-08-23T15:58:41Z) - Variable Importance in High-Dimensional Settings Requires Grouping [19.095605415846187]
Conditional Permutation Importance (CPI) bypasses PI's limitations in such cases.
Grouping variables statistically via clustering or some prior knowledge gains some power back.
We show that the approach extended with stacking controls the type-I error even with highly-correlated groups.
arXiv Detail & Related papers (2023-12-18T00:21:47Z) - Comparing Explanation Methods for Traditional Machine Learning Models
Part 2: Quantifying Model Explainability Faithfulness and Improvements with
Dimensionality Reduction [0.0]
"faithfulness" or "fidelity" refer to the correspondence between the assigned feature importance and the contribution of the feature to model performance.
This study is one of the first to quantify the improvement in explainability from limiting correlated features and knowing the relative fidelity of different explainability methods.
arXiv Detail & Related papers (2022-11-18T17:15:59Z) - Multivariate feature ranking of gene expression data [62.997667081978825]
We propose two new multivariate feature ranking methods based on pairwise correlation and pairwise consistency.
We statistically prove that the proposed methods outperform the state of the art feature ranking methods Clustering Variation, Chi Squared, Correlation, Information Gain, ReliefF and Significance.
arXiv Detail & Related papers (2021-11-03T17:19:53Z) - ACP++: Action Co-occurrence Priors for Human-Object Interaction
Detection [102.9428507180728]
A common problem in the task of human-object interaction (HOI) detection is that numerous HOI classes have only a small number of labeled examples.
We observe that there exist natural correlations and anti-correlations among human-object interactions.
We present techniques to learn these priors and leverage them for more effective training, especially on rare classes.
arXiv Detail & Related papers (2021-09-09T06:02:50Z) - MCDAL: Maximum Classifier Discrepancy for Active Learning [74.73133545019877]
Recent state-of-the-art active learning methods have mostly leveraged Generative Adversarial Networks (GAN) for sample acquisition.
We propose in this paper a novel active learning framework that we call Maximum Discrepancy for Active Learning (MCDAL)
In particular, we utilize two auxiliary classification layers that learn tighter decision boundaries by maximizing the discrepancies among them.
arXiv Detail & Related papers (2021-07-23T06:57:08Z) - BCFNet: A Balanced Collaborative Filtering Network with Attention
Mechanism [106.43103176833371]
Collaborative Filtering (CF) based recommendation methods have been widely studied.
We propose a novel recommendation model named Balanced Collaborative Filtering Network (BCFNet)
In addition, an attention mechanism is designed to better capture the hidden information within implicit feedback and strengthen the learning ability of the neural network.
arXiv Detail & Related papers (2021-03-10T14:59:23Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Linear Classifier Combination via Multiple Potential Functions [0.6091702876917279]
We propose a novel concept of calculating a scoring function based on the distance of the object from the decision boundary and its distance to the class centroid.
An important property is that the proposed score function has the same nature for all linear base classifiers.
arXiv Detail & Related papers (2020-10-02T08:11:51Z) - Towards a More Reliable Interpretation of Machine Learning Outputs for
Safety-Critical Systems using Feature Importance Fusion [0.0]
We introduce a novel fusion metric and compare it to the state-of-the-art.
Our approach is tested on synthetic data, where the ground truth is known.
Results show that our feature importance ensemble Framework overall produces 15% less feature importance error compared to existing methods.
arXiv Detail & Related papers (2020-09-11T15:51:52Z) - Detecting Human-Object Interactions with Action Co-occurrence Priors [108.31956827512376]
A common problem in human-object interaction (HOI) detection task is that numerous HOI classes have only a small number of labeled examples.
We observe that there exist natural correlations and anti-correlations among human-object interactions.
We present techniques to learn these priors and leverage them for more effective training, especially in rare classes.
arXiv Detail & Related papers (2020-07-17T02:47:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.