Feature construction using explanations of individual predictions
- URL: http://arxiv.org/abs/2301.09631v1
- Date: Mon, 23 Jan 2023 18:59:01 GMT
- Title: Feature construction using explanations of individual predictions
- Authors: Bo\v{s}tjan Vouk, Matej Guid, Marko Robnik-\v{S}ikonja
- Abstract summary: We propose a novel approach for reducing the search space based on aggregation of instance-based explanations of predictive models.
We empirically show that reducing the search to these groups significantly reduces the time of feature construction.
We show significant improvements in classification accuracy for several classifiers and demonstrate the feasibility of the proposed feature construction even for large datasets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Feature construction can contribute to comprehensibility and performance of
machine learning models. Unfortunately, it usually requires exhaustive search
in the attribute space or time-consuming human involvement to generate
meaningful features. We propose a novel heuristic approach for reducing the
search space based on aggregation of instance-based explanations of predictive
models. The proposed Explainable Feature Construction (EFC) methodology
identifies groups of co-occurring attributes exposed by popular explanation
methods, such as IME and SHAP. We empirically show that reducing the search to
these groups significantly reduces the time of feature construction using
logical, relational, Cartesian, numerical, and threshold num-of-N and X-of-N
constructive operators. An analysis on 10 transparent synthetic datasets shows
that EFC effectively identifies informative groups of attributes and constructs
relevant features. Using 30 real-world classification datasets, we show
significant improvements in classification accuracy for several classifiers and
demonstrate the feasibility of the proposed feature construction even for large
datasets. Finally, EFC generated interpretable features on a real-world problem
from the financial industry, which were confirmed by a domain expert.
Related papers
- Domain Specific Data Distillation and Multi-modal Embedding Generation [0.0]
The challenge of creating domain-centric embeddings arises from the abundance of unstructured data and the scarcity of domain-specific structured data.
This paper introduces a novel modeling approach that leverages structured data to filter noise from unstructured data, resulting in embeddings with high precision and recall for domain-specific attribute prediction.
arXiv Detail & Related papers (2024-10-27T03:47:46Z) - Prospector Heads: Generalized Feature Attribution for Large Models & Data [82.02696069543454]
We introduce prospector heads, an efficient and interpretable alternative to explanation-based attribution methods.
We demonstrate how prospector heads enable improved interpretation and discovery of class-specific patterns in input data.
arXiv Detail & Related papers (2024-02-18T23:01:28Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph
Construction [57.854498238624366]
We propose a retrieval-augmented approach, which retrieves schema-aware Reference As Prompt (RAP) for data-efficient knowledge graph construction.
RAP can dynamically leverage schema and knowledge inherited from human-annotated and weak-supervised data as a prompt for each sample.
arXiv Detail & Related papers (2022-10-19T16:40:28Z) - Rethinking Persistent Homology for Visual Recognition [27.625893409863295]
This paper performs a detailed analysis of the effectiveness of topological properties for image classification in various training scenarios.
We identify the scenarios that benefit the most from topological features, e.g., training simple networks on small datasets.
arXiv Detail & Related papers (2022-07-09T08:01:11Z) - AEFE: Automatic Embedded Feature Engineering for Categorical Features [4.310748698480341]
We propose an automatic feature engineering framework for representing categorical features, which consists of various components including custom paradigm feature construction and multiple feature selection.
Experiments conducted on some typical e-commerce datasets indicate that our method outperforms the classical machine learning models and state-of-the-art deep learning models.
arXiv Detail & Related papers (2021-10-19T07:22:59Z) - Adaptive Attribute and Structure Subspace Clustering Network [49.040136530379094]
We propose a novel self-expressiveness-based subspace clustering network.
We first consider an auto-encoder to represent input data samples.
Then, we construct a mixed signed and symmetric structure matrix to capture the local geometric structure underlying data.
We perform self-expressiveness on the constructed attribute structure and matrices to learn their affinity graphs.
arXiv Detail & Related papers (2021-09-28T14:00:57Z) - Structure-Aware Feature Generation for Zero-Shot Learning [108.76968151682621]
We introduce a novel structure-aware feature generation scheme, termed as SA-GAN, to account for the topological structure in learning both the latent space and the generative networks.
Our method significantly enhances the generalization capability on unseen-classes and consequently improve the classification performance.
arXiv Detail & Related papers (2021-08-16T11:52:08Z) - Discovery data topology with the closure structure. Theoretical and
practical aspects [21.70710923045654]
We introduce a concise representation -- the closure structure -- based on closed itemsets and their minimum generators.
We propose a formalization of the closure structure in terms of Formal Concept Analysis.
We present and demonstrate theoretical results, and as well, practical results using the GDPM algorithm.
arXiv Detail & Related papers (2020-10-06T11:21:56Z) - New advances in enumerative biclustering algorithms with online
partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.