Multi-Layer Attention-Based Explainability via Transformers for Tabular Data
- URL: http://arxiv.org/abs/2302.14278v2
- Date: Tue, 4 Jun 2024 03:59:23 GMT
- Title: Multi-Layer Attention-Based Explainability via Transformers for Tabular Data
- Authors: Andrea TreviƱo Gavito, Diego Klabjan, Jean Utke,
- Abstract summary: We propose a graph-oriented attention-based explainability method for tabular data.
We take into account the attention matrices of all heads and layers as a whole.
To assess the quality of multi-layer attention-based explanations, we compare them with popular attention-, gradient-, and perturbation-based explanability methods.
- Score: 11.866061471514582
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a graph-oriented attention-based explainability method for tabular data. Tasks involving tabular data have been solved mostly using traditional tree-based machine learning models which have the challenges of feature selection and engineering. With that in mind, we consider a transformer architecture for tabular data, which is amenable to explainability, and present a novel way to leverage self-attention mechanism to provide explanations by taking into account the attention matrices of all heads and layers as a whole. The matrices are mapped to a graph structure where groups of features correspond to nodes and attention values to arcs. By finding the maximum probability paths in the graph, we identify groups of features providing larger contributions to explain the model's predictions. To assess the quality of multi-layer attention-based explanations, we compare them with popular attention-, gradient-, and perturbation-based explanability methods.
Related papers
- Linear Transformer Topological Masking with Graph Random Features [52.717865653036796]
We show how to parameterise topological masks as a learnable function of a weighted adjacency matrix.
Our efficient masking algorithms provide strong performance gains for tasks on image and point cloud data.
arXiv Detail & Related papers (2024-10-04T14:24:06Z) - Hierarchical Aggregations for High-Dimensional Multiplex Graph Embedding [7.271256448682229]
HMGE is a novel embedding method based on hierarchical aggregation for high-dimensional multiplex graphs.
We leverage mutual information between local patches and global summaries to train the model without supervision.
Detailed experiments on synthetic and real-world data illustrate the suitability of our approach to downstream supervised tasks.
arXiv Detail & Related papers (2023-12-28T05:39:33Z) - Bures-Wasserstein Means of Graphs [60.42414991820453]
We propose a novel framework for defining a graph mean via embeddings in the space of smooth graph signal distributions.
By finding a mean in this embedding space, we can recover a mean graph that preserves structural information.
We establish the existence and uniqueness of the novel graph mean, and provide an iterative algorithm for computing it.
arXiv Detail & Related papers (2023-05-31T11:04:53Z) - Learning Representations without Compositional Assumptions [79.12273403390311]
We propose a data-driven approach that learns feature set dependencies by representing feature sets as graph nodes and their relationships as learnable edges.
We also introduce LEGATO, a novel hierarchical graph autoencoder that learns a smaller, latent graph to aggregate information from multiple views dynamically.
arXiv Detail & Related papers (2023-05-31T10:36:10Z) - GrannGAN: Graph annotation generative adversarial networks [72.66289932625742]
We consider the problem of modelling high-dimensional distributions and generating new examples of data with complex relational feature structure coherent with a graph skeleton.
The model we propose tackles the problem of generating the data features constrained by the specific graph structure of each data point by splitting the task into two phases.
In the first it models the distribution of features associated with the nodes of the given graph, in the second it complements the edge features conditionally on the node features.
arXiv Detail & Related papers (2022-12-01T11:49:07Z) - Cell Attention Networks [25.72671436731666]
We introduce Cell Attention Networks (CANs), a neural architecture operating on data defined over the vertices of a graph.
CANs exploit the lower and upper neighborhoods, as encoded in the cell complex, to design two independent masked self-attention mechanisms.
The experimental results show that CAN is a low complexity strategy that compares favorably with state of the art results on graph-based learning tasks.
arXiv Detail & Related papers (2022-09-16T21:57:39Z) - Graph Attention Transformer Network for Multi-Label Image Classification [50.0297353509294]
We propose a general framework for multi-label image classification that can effectively mine complex inter-label relationships.
Our proposed methods can achieve state-of-the-art performance on three datasets.
arXiv Detail & Related papers (2022-03-08T12:39:05Z) - Siamese Attribute-missing Graph Auto-encoder [35.79233150253881]
We propose Siamese Attribute-missing Graph Auto-encoder (SAGA)
First, we entangle the attribute embedding and structure embedding by introducing a siamese network structure to share the parameters learned by both processes.
Second, we introduce a K-nearest neighbor (KNN) and structural constraint enhanced learning mechanism to improve the quality of latent features of the missing attributes.
arXiv Detail & Related papers (2021-12-09T11:21:31Z) - Multiple Graph Learning for Scalable Multi-view Clustering [26.846642220480863]
We propose an efficient multiple graph learning model via a small number of anchor points and tensor Schatten p-norm minimization.
Specifically, we construct a hidden and tractable large graph by anchor graph for each view.
We develop an efficient algorithm, which scales linearly with the data size, to solve our proposed model.
arXiv Detail & Related papers (2021-06-29T13:10:56Z) - SUGAR: Subgraph Neural Network with Reinforcement Pooling and
Self-Supervised Mutual Information Mechanism [33.135006052347194]
This paper presents a novel hierarchical subgraph-level selection and embedding based graph neural network for graph classification, namely SUGAR.
SUGAR reconstructs a sketched graph by extracting striking subgraphs as the representative part of the original graph to reveal subgraph-level patterns.
To differentiate subgraph representations among graphs, we present a self-supervised mutual information mechanism to encourage subgraph embedding.
arXiv Detail & Related papers (2021-01-20T15:06:16Z) - Multilayer Clustered Graph Learning [66.94201299553336]
We use contrastive loss as a data fidelity term, in order to properly aggregate the observed layers into a representative graph.
Experiments show that our method leads to a clustered clusters w.r.t.
We learn a clustering algorithm for solving clustering problems.
arXiv Detail & Related papers (2020-10-29T09:58:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.