Beyond Importance Scores: Interpreting Tabular ML by Visualizing Feature
Semantics
- URL: http://arxiv.org/abs/2111.05898v1
- Date: Wed, 10 Nov 2021 19:42:33 GMT
- Title: Beyond Importance Scores: Interpreting Tabular ML by Visualizing Feature
Semantics
- Authors: Amirata Ghorbani, Dina Berenbaum, Maor Ivgi, Yuval Dafna, James Zou
- Abstract summary: Interpretability is becoming an active research topic as machine learning (ML) models are more widely used to make critical decisions.
Much of the existing interpretability methods used for tabular data only report feature-importance scores.
We address this limitation by introducing Feature Vectors, a new global interpretability method.
- Score: 17.410093908967976
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interpretability is becoming an active research topic as machine learning
(ML) models are more widely used to make critical decisions. Tabular data is
one of the most commonly used modes of data in diverse applications such as
healthcare and finance. Much of the existing interpretability methods used for
tabular data only report feature-importance scores -- either locally (per
example) or globally (per model) -- but they do not provide interpretation or
visualization of how the features interact. We address this limitation by
introducing Feature Vectors, a new global interpretability method designed for
tabular datasets. In addition to providing feature-importance, Feature Vectors
discovers the inherent semantic relationship among features via an intuitive
feature visualization technique. Our systematic experiments demonstrate the
empirical utility of this new method by applying it to several real-world
datasets. We further provide an easy-to-use Python package for Feature Vectors.
Related papers
- InterpreTabNet: Distilling Predictive Signals from Tabular Data by Salient Feature Interpretation [7.67293014317639]
We propose a variant of the TabNet model that models the attention mechanism as a latent variable sampled from a Gumbel-Softmax distribution.
This enables us to regularize the model to learn distinct concepts in the attention masks via a KL Divergence regularizer.
It prevents overlapping feature selection by promoting sparsity which maximizes the model's efficacy and improves interpretability.
arXiv Detail & Related papers (2024-06-01T12:48:11Z) - Prospector Heads: Generalized Feature Attribution for Large Models & Data [82.02696069543454]
We introduce prospector heads, an efficient and interpretable alternative to explanation-based attribution methods.
We demonstrate how prospector heads enable improved interpretation and discovery of class-specific patterns in input data.
arXiv Detail & Related papers (2024-02-18T23:01:28Z) - DimVis: Interpreting Visual Clusters in Dimensionality Reduction With Explainable Boosting Machine [3.2748787252933442]
DimVis is a tool that employs supervised Explainable Boosting Machine (EBM) models as an interpretation assistant for DR projections.
Our tool facilitates high-dimensional data analysis by providing an interpretation of feature relevance in visual clusters.
arXiv Detail & Related papers (2024-02-10T04:50:36Z) - Learning Representations without Compositional Assumptions [79.12273403390311]
We propose a data-driven approach that learns feature set dependencies by representing feature sets as graph nodes and their relationships as learnable edges.
We also introduce LEGATO, a novel hierarchical graph autoencoder that learns a smaller, latent graph to aggregate information from multiple views dynamically.
arXiv Detail & Related papers (2023-05-31T10:36:10Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - PTab: Using the Pre-trained Language Model for Modeling Tabular Data [5.791972449406902]
Recent studies show that neural-based models are effective in learning contextual representation for Tabular data.
We propose a novel framework PTab, using the Pre-trained language model to model Tabular data.
Our method has achieved a better average AUC score in supervised settings compared to the state-of-the-art baselines.
arXiv Detail & Related papers (2022-09-15T08:58:42Z) - DIWIFT: Discovering Instance-wise Influential Features for Tabular Data [29.69737486124891]
Tabular data is one of the most common data storage formats in business applications, ranging from retail, bank and E-commerce.
One of the critical problems in learning tabular data is to distinguish influential features from all the predetermined features.
We propose a novel method for discovering instance-wise influential features for tabular data (DIWIFT)
Our method minimizes the validation loss on the validation set and is thus more robust to the distribution shift existing in the training dataset and test dataset.
arXiv Detail & Related papers (2022-07-06T16:07:46Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - SubTab: Subsetting Features of Tabular Data for Self-Supervised
Representation Learning [5.5616364225463055]
We introduce a new framework, Subsetting features of Tabular data (SubTab)
In this paper, we introduce a new framework, Subsetting features of Tabular data (SubTab)
We argue that reconstructing the data from the subset of its features rather than its corrupted version in an autoencoder setting can better capture its underlying representation.
arXiv Detail & Related papers (2021-10-08T20:11:09Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Towards Understanding Sample Variance in Visually Grounded Language
Generation: Evaluations and Observations [67.4375210552593]
We design experiments to understand an important but often ignored problem in visually grounded language generation.
Given that humans have different utilities and visual attention, how will the sample variance in multi-reference datasets affect the models' performance?
We show that it is of paramount importance to report variance in experiments; that human-generated references could vary drastically in different datasets/tasks, revealing the nature of each task.
arXiv Detail & Related papers (2020-10-07T20:45:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.