Related papers: Multi-Layer Attention-Based Explainability via Transformers for Tabular Data

Multi-Layer Attention-Based Explainability via Transformers for Tabular Data

URL: http://arxiv.org/abs/2302.14278v2
Date: Tue, 4 Jun 2024 03:59:23 GMT
Title: Multi-Layer Attention-Based Explainability via Transformers for Tabular Data
Authors: Andrea Treviño Gavito, Diego Klabjan, Jean Utke,
Abstract summary: We propose a graph-oriented attention-based explainability method for tabular data. We take into account the attention matrices of all heads and layers as a whole. To assess the quality of multi-layer attention-based explanations, we compare them with popular attention-, gradient-, and perturbation-based explanability methods.
Score: 11.866061471514582
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a graph-oriented attention-based explainability method for tabular data. Tasks involving tabular data have been solved mostly using traditional tree-based machine learning models which have the challenges of feature selection and engineering. With that in mind, we consider a transformer architecture for tabular data, which is amenable to explainability, and present a novel way to leverage self-attention mechanism to provide explanations by taking into account the attention matrices of all heads and layers as a whole. The matrices are mapped to a graph structure where groups of features correspond to nodes and attention values to arcs. By finding the maximum probability paths in the graph, we identify groups of features providing larger contributions to explain the model's predictions. To assess the quality of multi-layer attention-based explanations, we compare them with popular attention-, gradient-, and perturbation-based explanability methods.

Related papers

Graph Counterfactual Explainable AI via Latent Space Traversal [4.337339380445765]
Counterfactual explanations aim to explain predictions by finding the ''nearest'' in-distribution alternative input. We propose a method to generate counterfactual explanations for any differentiable black-box graph classifier. We empirically validate the approach on three graph datasets, showing that our model is consistently high-performing and more robust than the baselines.
arXiv Detail & Related papers (2025-01-15T15:04:10Z)
Graph-Dictionary Signal Model for Sparse Representations of Multivariate Data [49.77103348208835]
We define a novel Graph-Dictionary signal model, where a finite set of graphs characterizes relationships in data distribution through a weighted sum of their Laplacians. We propose a framework to infer the graph dictionary representation from observed data, along with a bilinear generalization of the primal-dual splitting algorithm to solve the learning problem. We exploit graph-dictionary representations in a motor imagery decoding task on brain activity data, where we classify imagined motion better than standard methods.
arXiv Detail & Related papers (2024-11-08T17:40:43Z)
Linear Transformer Topological Masking with Graph Random Features [52.717865653036796]
We show how to parameterise topological masks as a learnable function of a weighted adjacency matrix. Our efficient masking algorithms provide strong performance gains for tasks on image and point cloud data.
arXiv Detail & Related papers (2024-10-04T14:24:06Z)
An end-to-end attention-based approach for learning on graphs [8.552020965470113]
transformer-based architectures for learning on graphs are motivated by attention as an effective learning mechanism. We propose a purely attention-based approach consisting of an encoder and an attention pooling mechanism. Despite its simplicity, the approach outperforms fine-tuned message passing baselines and recently proposed transformer-based methods on more than 70 node and graph-level tasks.
arXiv Detail & Related papers (2024-02-16T16:20:11Z)
Hierarchical Aggregations for High-Dimensional Multiplex Graph Embedding [7.271256448682229]
HMGE is a novel embedding method based on hierarchical aggregation for high-dimensional multiplex graphs. We leverage mutual information between local patches and global summaries to train the model without supervision. Detailed experiments on synthetic and real-world data illustrate the suitability of our approach to downstream supervised tasks.
arXiv Detail & Related papers (2023-12-28T05:39:33Z)
Bures-Wasserstein Means of Graphs [60.42414991820453]
We propose a novel framework for defining a graph mean via embeddings in the space of smooth graph signal distributions. By finding a mean in this embedding space, we can recover a mean graph that preserves structural information. We establish the existence and uniqueness of the novel graph mean, and provide an iterative algorithm for computing it.
arXiv Detail & Related papers (2023-05-31T11:04:53Z)
Learning Representations without Compositional Assumptions [79.12273403390311]
We propose a data-driven approach that learns feature set dependencies by representing feature sets as graph nodes and their relationships as learnable edges. We also introduce LEGATO, a novel hierarchical graph autoencoder that learns a smaller, latent graph to aggregate information from multiple views dynamically.
arXiv Detail & Related papers (2023-05-31T10:36:10Z)
GrannGAN: Graph annotation generative adversarial networks [72.66289932625742]
We consider the problem of modelling high-dimensional distributions and generating new examples of data with complex relational feature structure coherent with a graph skeleton. The model we propose tackles the problem of generating the data features constrained by the specific graph structure of each data point by splitting the task into two phases. In the first it models the distribution of features associated with the nodes of the given graph, in the second it complements the edge features conditionally on the node features.
arXiv Detail & Related papers (2022-12-01T11:49:07Z)
Cell Attention Networks [25.72671436731666]
We introduce Cell Attention Networks (CANs), a neural architecture operating on data defined over the vertices of a graph. CANs exploit the lower and upper neighborhoods, as encoded in the cell complex, to design two independent masked self-attention mechanisms. The experimental results show that CAN is a low complexity strategy that compares favorably with state of the art results on graph-based learning tasks.
arXiv Detail & Related papers (2022-09-16T21:57:39Z)
Graph Attention Transformer Network for Multi-Label Image Classification [50.0297353509294]
We propose a general framework for multi-label image classification that can effectively mine complex inter-label relationships. Our proposed methods can achieve state-of-the-art performance on three datasets.
arXiv Detail & Related papers (2022-03-08T12:39:05Z)
Siamese Attribute-missing Graph Auto-encoder [35.79233150253881]
We propose Siamese Attribute-missing Graph Auto-encoder (SAGA) First, we entangle the attribute embedding and structure embedding by introducing a siamese network structure to share the parameters learned by both processes. Second, we introduce a K-nearest neighbor (KNN) and structural constraint enhanced learning mechanism to improve the quality of latent features of the missing attributes.
arXiv Detail & Related papers (2021-12-09T11:21:31Z)
Multiple Graph Learning for Scalable Multi-view Clustering [26.846642220480863]
We propose an efficient multiple graph learning model via a small number of anchor points and tensor Schatten p-norm minimization. Specifically, we construct a hidden and tractable large graph by anchor graph for each view. We develop an efficient algorithm, which scales linearly with the data size, to solve our proposed model.
arXiv Detail & Related papers (2021-06-29T13:10:56Z)
SUGAR: Subgraph Neural Network with Reinforcement Pooling and Self-Supervised Mutual Information Mechanism [33.135006052347194]
This paper presents a novel hierarchical subgraph-level selection and embedding based graph neural network for graph classification, namely SUGAR. SUGAR reconstructs a sketched graph by extracting striking subgraphs as the representative part of the original graph to reveal subgraph-level patterns. To differentiate subgraph representations among graphs, we present a self-supervised mutual information mechanism to encourage subgraph embedding.
arXiv Detail & Related papers (2021-01-20T15:06:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.