A data-science pipeline to enable the Interpretability of Many-Objective
Feature Selection
- URL: http://arxiv.org/abs/2311.18746v1
- Date: Thu, 30 Nov 2023 17:44:22 GMT
- Title: A data-science pipeline to enable the Interpretability of Many-Objective
Feature Selection
- Authors: Uchechukwu F. Njoku, Alberto Abell\'o, Besim Bilalli, Gianluca
Bontempi
- Abstract summary: Many-Objective Feature Selection (MOFS) approaches use four or more objectives to determine the relevance of a subset of features in a supervised learning task.
This paper proposes an original methodology to support data scientists in the interpretation and comparison of the MOFS outcome by combining post-processing and visualisation of the set of solutions.
- Score: 0.1474723404975345
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many-Objective Feature Selection (MOFS) approaches use four or more
objectives to determine the relevance of a subset of features in a supervised
learning task. As a consequence, MOFS typically returns a large set of
non-dominated solutions, which have to be assessed by the data scientist in
order to proceed with the final choice. Given the multi-variate nature of the
assessment, which may include criteria (e.g. fairness) not related to
predictive accuracy, this step is often not straightforward and suffers from
the lack of existing tools. For instance, it is common to make use of a tabular
presentation of the solutions, which provide little information about the
trade-offs and the relations between criteria over the set of solutions.
This paper proposes an original methodology to support data scientists in the
interpretation and comparison of the MOFS outcome by combining post-processing
and visualisation of the set of solutions. The methodology supports the data
scientist in the selection of an optimal feature subset by providing her with
high-level information at three different levels: objectives, solutions, and
individual features.
The methodology is experimentally assessed on two feature selection tasks
adopting a GA-based MOFS with six objectives (number of selected features,
balanced accuracy, F1-Score, variance inflation factor, statistical parity, and
equalised odds). The results show the added value of the methodology in the
selection of the final subset of features.
Related papers
- LLM-Select: Feature Selection with Large Language Models [64.5099482021597]
Large language models (LLMs) are capable of selecting the most predictive features, with performance rivaling the standard tools of data science.
Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place.
arXiv Detail & Related papers (2024-07-02T22:23:40Z) - Multi-Teacher Multi-Objective Meta-Learning for Zero-Shot Hyperspectral Band Selection [50.30291173608449]
We propose a novel multi-objective meta-learning network (M$3$BS) for zero-shot hyperspectral band selection.
In M$3$BS, a generalizable graph convolution network (GCN) is constructed to generate dataset-agnostic base.
The acquired meta-knowledge can be directly transferred to unseen datasets without any retraining or fine-tuning.
arXiv Detail & Related papers (2024-06-12T07:13:31Z) - A Contrast Based Feature Selection Algorithm for High-dimensional Data
set in Machine Learning [9.596923373834093]
We propose a novel filter feature selection method, ContrastFS, which selects discriminative features based on the discrepancies features shown between different classes.
We validate effectiveness and efficiency of our approach on several widely studied benchmark datasets, results show that the new method performs favorably with negligible computation.
arXiv Detail & Related papers (2024-01-15T05:32:35Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Multi-Objective Genetic Algorithm for Multi-View Feature Selection [0.23343923880060582]
We propose a novel genetic algorithm strategy to overcome limitations of traditional feature selection methods for multi-view data.
Our proposed approach, called the multi-view multi-objective feature selection genetic algorithm (MMFS-GA), simultaneously selects the optimal subset of features within a view and between views.
The results of our evaluations on three benchmark datasets, including synthetic and real data, show improvement over the best baseline methods.
arXiv Detail & Related papers (2023-05-26T13:25:20Z) - A User-Guided Bayesian Framework for Ensemble Feature Selection in Life
Science Applications (UBayFS) [0.0]
We propose UBayFS, an ensemble feature selection technique, embedded in a Bayesian statistical framework.
Our approach enhances the feature selection process by considering two sources of information: data and domain knowledge.
A comparison with standard feature selectors underlines that UBayFS achieves competitive performance, while providing additional flexibility to incorporate domain knowledge.
arXiv Detail & Related papers (2021-04-30T06:51:33Z) - Leveraging Expert Consistency to Improve Algorithmic Decision Support [62.61153549123407]
We explore the use of historical expert decisions as a rich source of information that can be combined with observed outcomes to narrow the construct gap.
We propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert.
Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap.
arXiv Detail & Related papers (2021-01-24T05:40:29Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Feature Selection for Huge Data via Minipatch Learning [0.0]
We propose Stable Minipatch Selection (STAMPS) and Adaptive STAMPS.
STAMPS are meta-algorithms that build ensembles of selection events of base feature selectors trained on tiny, (ly-adaptive) random subsets of both the observations and features of the data.
Our approaches are general and can be employed with a variety of existing feature selection strategies and machine learning techniques.
arXiv Detail & Related papers (2020-10-16T17:41:08Z) - Joint Adaptive Graph and Structured Sparsity Regularization for
Unsupervised Feature Selection [6.41804410246642]
We propose a joint adaptive graph and structured sparsity regularization unsupervised feature selection (JASFS) method.
A subset of optimal features will be selected in group, and the number of selected features will be determined automatically.
Experimental results on eight benchmarks demonstrate the effectiveness and efficiency of the proposed method.
arXiv Detail & Related papers (2020-10-09T08:17:04Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.