EFI: A Toolbox for Feature Importance Fusion and Interpretation in
Python
- URL: http://arxiv.org/abs/2208.04343v1
- Date: Mon, 8 Aug 2022 18:02:37 GMT
- Title: EFI: A Toolbox for Feature Importance Fusion and Interpretation in
Python
- Authors: Aayush Kumar, Jimiama Mafeni Mase, Divish Rengasamy, Benjamin
Rothwell, Mercedes Torres Torres, David A. Winkler, Grazziela P. Figueredo
- Abstract summary: Ensemble Feature Importance (EFI) is an open-source Python toolbox for machine learning (ML) researchers, domain experts, and decision makers.
EFI provides robust and accurate feature importance quantification and more reliable mechanistic interpretation of feature importance for prediction problems.
- Score: 1.593222804814135
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents an open-source Python toolbox called Ensemble Feature
Importance (EFI) to provide machine learning (ML) researchers, domain experts,
and decision makers with robust and accurate feature importance quantification
and more reliable mechanistic interpretation of feature importance for
prediction problems using fuzzy sets. The toolkit was developed to address
uncertainties in feature importance quantification and lack of trustworthy
feature importance interpretation due to the diverse availability of machine
learning algorithms, feature importance calculation methods, and dataset
dependencies. EFI merges results from multiple machine learning models with
different feature importance calculation approaches using data bootstrapping
and decision fusion techniques, such as mean, majority voting and fuzzy logic.
The main attributes of the EFI toolbox are: (i) automatic optimisation of ML
algorithms, (ii) automatic computation of a set of feature importance
coefficients from optimised ML algorithms and feature importance calculation
techniques, (iii) automatic aggregation of importance coefficients using
multiple decision fusion techniques, and (iv) fuzzy membership functions that
show the importance of each feature to the prediction task. The key modules and
functions of the toolbox are described, and a simple example of their
application is presented using the popular Iris dataset.
Related papers
- LLM-assisted Explicit and Implicit Multi-interest Learning Framework for Sequential Recommendation [50.98046887582194]
We propose an explicit and implicit multi-interest learning framework to model user interests on two levels: behavior and semantics.
The proposed EIMF framework effectively and efficiently combines small models with LLM to improve the accuracy of multi-interest modeling.
arXiv Detail & Related papers (2024-11-14T13:00:23Z) - Efficient Network Traffic Feature Sets for IoT Intrusion Detection [0.0]
This work evaluates the feature sets provided by a combination of different feature selection methods, namely Information Gain, Chi-Squared Test, Recursive Feature Elimination, Mean Absolute Deviation, and Dispersion Ratio, in multiple IoT network datasets.
The influence of the smaller feature sets on both the classification performance and the training time of ML models is compared, with the aim of increasing the computational efficiency of IoT intrusion detection.
arXiv Detail & Related papers (2024-06-12T09:51:29Z) - LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks.
By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections.
Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Metric Tools for Sensitivity Analysis with Applications to Neural
Networks [0.0]
Explainable Artificial Intelligence (XAI) aims to provide interpretations for predictions made by Machine Learning models.
In this paper, a theoretical framework is proposed to study sensitivities of ML models using metric techniques.
A complete family of new quantitative metrics called $alpha$-curves is extracted.
arXiv Detail & Related papers (2023-05-03T18:10:21Z) - Mechanistic Interpretation of Machine Learning Inference: A Fuzzy
Feature Importance Fusion Approach [0.39146761527401425]
There is a lack of consensus regarding how feature importance should be quantified.
Current state-of-the-art ensemble feature importance fusion uses crisp techniques to fuse results from different approaches.
Here we show how the use of fuzzy data fusion methods can overcome some of the important limitations of crisp fusion methods.
arXiv Detail & Related papers (2021-10-22T11:22:21Z) - AEFE: Automatic Embedded Feature Engineering for Categorical Features [4.310748698480341]
We propose an automatic feature engineering framework for representing categorical features, which consists of various components including custom paradigm feature construction and multiple feature selection.
Experiments conducted on some typical e-commerce datasets indicate that our method outperforms the classical machine learning models and state-of-the-art deep learning models.
arXiv Detail & Related papers (2021-10-19T07:22:59Z) - Feature Weighted Non-negative Matrix Factorization [92.45013716097753]
We propose the Feature weighted Non-negative Matrix Factorization (FNMF) in this paper.
FNMF learns the weights of features adaptively according to their importances.
It can be solved efficiently with the suggested optimization algorithm.
arXiv Detail & Related papers (2021-03-24T21:17:17Z) - Towards a More Reliable Interpretation of Machine Learning Outputs for
Safety-Critical Systems using Feature Importance Fusion [0.0]
We introduce a novel fusion metric and compare it to the state-of-the-art.
Our approach is tested on synthetic data, where the ground truth is known.
Results show that our feature importance ensemble Framework overall produces 15% less feature importance error compared to existing methods.
arXiv Detail & Related papers (2020-09-11T15:51:52Z) - Estimating Structural Target Functions using Machine Learning and
Influence Functions [103.47897241856603]
We propose a new framework for statistical machine learning of target functions arising as identifiable functionals from statistical models.
This framework is problem- and model-agnostic and can be used to estimate a broad variety of target parameters of interest in applied statistics.
We put particular focus on so-called coarsening at random/doubly robust problems with partially unobserved information.
arXiv Detail & Related papers (2020-08-14T16:48:29Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.