Related papers: Attention Mechanisms Don't Learn Additive Models: Rethinking Feature Importance for Transformers

Attention Mechanisms Don't Learn Additive Models: Rethinking Feature Importance for Transformers

URL: http://arxiv.org/abs/2405.13536v2
Date: Thu, 09 Jan 2025 17:58:44 GMT
Title: Attention Mechanisms Don't Learn Additive Models: Rethinking Feature Importance for Transformers
Authors: Tobias Leemann, Alina Fastowski, Felix Pfeiffer, Gjergji Kasneci,
Abstract summary: transformers are structurally incapable of representing linear or additive surrogate models used for feature attribution.<n>We introduce the Softmax-Linked Additive Log Odds Model (SLALOM), a novel surrogate model specifically designed to align with the transformer framework.<n>We highlight SLALOM's unique efficiency-quality curve by showing that SLALOM can produce explanations with substantially higher fidelity than competing surrogate models.
Score: 12.986126243018452
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We address the critical challenge of applying feature attribution methods to the transformer architecture, which dominates current applications in natural language processing and beyond. Traditional attribution methods to explainable AI (XAI) explicitly or implicitly rely on linear or additive surrogate models to quantify the impact of input features on a model's output. In this work, we formally prove an alarming incompatibility: transformers are structurally incapable of representing linear or additive surrogate models used for feature attribution, undermining the grounding of these conventional explanation methodologies. To address this discrepancy, we introduce the Softmax-Linked Additive Log Odds Model (SLALOM), a novel surrogate model specifically designed to align with the transformer framework. SLALOM demonstrates the capacity to deliver a range of insightful explanations with both synthetic and real-world datasets. We highlight SLALOM's unique efficiency-quality curve by showing that SLALOM can produce explanations with substantially higher fidelity than competing surrogate models or provide explanations of comparable quality at a fraction of their computational costs. We release code for SLALOM as an open-source project online at https://github.com/tleemann/slalom_explanations.

Related papers

Minimizing False-Positive Attributions in Explanations of Non-Linear Models [2.6292731747611575]
Suppressor variables can influence model predictions without being dependent on the target outcome.<n>We introduce PatternLocal, a novel XAI technique that addresses this gap.
arXiv Detail & Related papers (2025-05-16T13:06:12Z)
Disentanglement with Factor Quantized Variational Autoencoders [11.086500036180222]
We propose a discrete variational autoencoder (VAE) based model where the ground truth information about the generative factors are not provided to the model. We demonstrate the advantages of learning discrete representations over learning continuous representations in facilitating disentanglement. Our method called FactorQVAE is the first method that combines optimization based disentanglement approaches with discrete representation learning.
arXiv Detail & Related papers (2024-09-23T09:33:53Z)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction. SMILE allows for the upscaling of source models into an MoE model without extra data or further training. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z)
Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation [54.50526986788175]
Recent advances in efficient sequence modeling have led to attention-free layers, such as Mamba, RWKV, and various gated RNNs. We present a unified view of these models, formulating such layers as implicit causal self-attention layers. Our framework compares the underlying mechanisms on similar grounds for different layers and provides a direct means for applying explainability methods.
arXiv Detail & Related papers (2024-05-26T09:57:45Z)
MergeNet: Knowledge Migration across Heterogeneous Models, Tasks, and Modalities [72.68829963458408]
We present MergeNet, which learns to bridge the gap of parameter spaces of heterogeneous models. The core mechanism of MergeNet lies in the parameter adapter, which operates by querying the source model's low-rank parameters. MergeNet is learned alongside both models, allowing our framework to dynamically transfer and adapt knowledge relevant to the current stage.
arXiv Detail & Related papers (2024-04-20T08:34:39Z)
AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers [14.147646140595649]
Large Language Models are prone to biased predictions and hallucinations. achieving faithful attributions for the entirety of a black-box transformer model and maintaining computational efficiency is an unsolved challenge.
arXiv Detail & Related papers (2024-02-08T12:01:24Z)
Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space. We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z)
Distributional Learning of Variational AutoEncoder: Application to Synthetic Data Generation [0.7614628596146602]
We propose a new approach that expands the model capacity without sacrificing the computational advantages of the VAE framework. Our VAE model's decoder is composed of an infinite mixture of asymmetric Laplace distribution. We apply the proposed model to synthetic data generation, and particularly, our model demonstrates superiority in easily adjusting the level of data privacy.
arXiv Detail & Related papers (2023-02-22T11:26:50Z)
Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL [154.13105285663656]
A cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications. Unfortunately, the theoretical understanding of this MARL problem is lacking due to the curse of many agents and the limited exploration of the relational reasoning in existing works. We prove that the suboptimality gaps of the model-free and model-based algorithms are independent of and logarithmic in the number of agents respectively, which mitigates the curse of many agents.
arXiv Detail & Related papers (2022-09-20T16:42:59Z)
MACE: An Efficient Model-Agnostic Framework for Counterfactual Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE) In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity. Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z)
Automatic Rule Induction for Efficient Semi-Supervised Learning [56.91428251227253]
Semi-supervised learning has shown promise in allowing NLP models to generalize from small amounts of labeled data. Pretrained transformer models act as black-box correlation engines that are difficult to explain and sometimes behave unreliably. We propose tackling both of these challenges via Automatic Rule Induction (ARI), a simple and general-purpose framework.
arXiv Detail & Related papers (2022-05-18T16:50:20Z)
Learning Deep Implicit Fourier Neural Operators (IFNOs) with Applications to Heterogeneous Material Modeling [3.9181541460605116]
We propose to use data-driven modeling to predict a material's response without using conventional models. The material response is modeled by learning the implicit mappings between loading conditions and the resultant displacement and/or damage fields. We demonstrate the performance of our proposed method for a number of examples, including hyperelastic, anisotropic and brittle materials.
arXiv Detail & Related papers (2022-03-15T19:08:13Z)
XAI for Transformers: Better Explanations through Conservative Propagation [60.67748036747221]
We show that the gradient in a Transformer reflects the function only locally, and thus fails to reliably identify the contribution of input features to the prediction. Our proposal can be seen as a proper extension of the well-established LRP method to Transformers.
arXiv Detail & Related papers (2022-02-15T10:47:11Z)
Embedded-model flows: Combining the inductive biases of model-free deep learning and explicit probabilistic modeling [8.405013085269976]
We propose embedded-model flows which alternate general-purpose transformations with structured layers that embed domain-specific inductive biases. We demonstrate that EMFs can be used to induce desirable properties such as multimodality, hierarchical coupling and continuity. In experiments, we show that this approach outperforms state-of-the-art methods in common structured inference problems.
arXiv Detail & Related papers (2021-10-12T14:12:16Z)
CARE: Coherent Actionable Recourse based on Sound Counterfactual Explanations [0.0]
This paper introduces CARE, a modular explanation framework that addresses the model- and user-level desiderata. As a model-agnostic approach, CARE generates multiple, diverse explanations for any black-box model.
arXiv Detail & Related papers (2021-08-18T15:26:59Z)
Causality-aware counterfactual confounding adjustment for feature representations learned by deep models [14.554818659491644]
Causal modeling has been recognized as a potential solution to many challenging problems in machine learning (ML) We describe how a recently proposed counterfactual approach can still be used to deconfound the feature representations learned by deep neural network (DNN) models.
arXiv Detail & Related papers (2020-04-20T17:37:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.