A Close Look at Decomposition-based XAI-Methods for Transformer Language Models
- URL: http://arxiv.org/abs/2502.15886v1
- Date: Fri, 21 Feb 2025 19:09:40 GMT
- Title: A Close Look at Decomposition-based XAI-Methods for Transformer Language Models
- Authors: Leila Arras, Bruno Puri, Patrick Kahardipraja, Sebastian Lapuschkin, Wojciech Samek,
- Abstract summary: XAI attribution methods have been recently proposed for the transformer architecture.<n>We compare and extend the ALTI-Logit and LRP methods, including the recently proposed AttnLRP variant.<n>We make our carefullly constructed benchmark dataset for evaluating attributions on language models, as well as our code, publicly available.
- Score: 12.51070801823624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Various XAI attribution methods have been recently proposed for the transformer architecture, allowing for insights into the decision-making process of large language models by assigning importance scores to input tokens and intermediate representations. One class of methods that seems very promising in this direction includes decomposition-based approaches, i.e., XAI-methods that redistribute the model's prediction logit through the network, as this value is directly related to the prediction. In the previous literature we note though that two prominent methods of this category, namely ALTI-Logit and LRP, have not yet been analyzed in juxtaposition and hence we propose to close this gap by conducting a careful quantitative evaluation w.r.t. ground truth annotations on a subject-verb agreement task, as well as various qualitative inspections, using BERT, GPT-2 and LLaMA-3 as a testbed. Along the way we compare and extend the ALTI-Logit and LRP methods, including the recently proposed AttnLRP variant, from an algorithmic and implementation perspective. We further incorporate in our benchmark two widely-used gradient-based attribution techniques. Finally, we make our carefullly constructed benchmark dataset for evaluating attributions on language models, as well as our code, publicly available in order to foster evaluation of XAI-methods on a well-defined common ground.
Related papers
- ODExAI: A Comprehensive Object Detection Explainable AI Evaluation [1.338174941551702]
We introduce the Object Detection Explainable AI Evaluation (ODExAI) to assess XAI methods in object detection.
We benchmark a set of XAI methods across two widely used object detectors and standard datasets.
arXiv Detail & Related papers (2025-04-27T14:16:14Z) - NormXLogit: The Head-on-Top Never Lies [15.215985417763472]
Transformer architecture has emerged as the dominant choice for building large language models.
We propose a novel technique, called NormXLogit, for assessing the significance of individual input tokens.
We show that our approach consistently outperforms existing gradient-based methods in terms of faithfulness.
arXiv Detail & Related papers (2024-11-25T10:12:27Z) - Interpreting Object-level Foundation Models via Visual Precision Search [53.807678972967224]
We propose a Visual Precision Search method that generates accurate attribution maps with fewer regions.
Our method bypasses internal model parameters to overcome attribution issues from multimodal fusion.
Our method can interpret failures in visual grounding and object detection tasks, surpassing existing methods across multiple evaluation metrics.
arXiv Detail & Related papers (2024-11-25T08:54:54Z) - Deep Model Interpretation with Limited Data : A Coreset-based Approach [0.810304644344495]
We propose a coreset-based interpretation framework that utilizes coreset selection methods to sample a representative subset of the large dataset for the interpretation task.
We propose a similarity-based evaluation protocol to assess the robustness of model interpretation methods towards the amount data they take as input.
arXiv Detail & Related papers (2024-10-01T09:07:24Z) - Coherent Entity Disambiguation via Modeling Topic and Categorical
Dependency [87.16283281290053]
Previous entity disambiguation (ED) methods adopt a discriminative paradigm, where prediction is made based on matching scores between mention context and candidate entities.
We propose CoherentED, an ED system equipped with novel designs aimed at enhancing the coherence of entity predictions.
We achieve new state-of-the-art results on popular ED benchmarks, with an average improvement of 1.3 F1 points.
arXiv Detail & Related papers (2023-11-06T16:40:13Z) - Precise Benchmarking of Explainable AI Attribution Methods [0.0]
We propose a novel evaluation approach for benchmarking state-of-the-art XAI attribution methods.
Our proposal consists of a synthetic classification model accompanied by its derived ground truth explanations.
Our experimental results provide novel insights into the performance of Guided-Backprop and Smoothgrad XAI methods.
arXiv Detail & Related papers (2023-08-06T17:03:32Z) - Open-Domain Text Evaluation via Contrastive Distribution Methods [75.59039812868681]
We introduce a novel method for evaluating open-domain text generation called Contrastive Distribution Methods.
Our experiments on coherence evaluation for multi-turn dialogue and commonsense evaluation for controllable generation demonstrate CDM's superior correlate with human judgment.
arXiv Detail & Related papers (2023-06-20T20:37:54Z) - Local and Global Context-Based Pairwise Models for Sentence Ordering [0.0]
In this paper, we put forward a set of robust local and global context-based pairwise ordering strategies.
Our proposed encoding method utilizes the paragraph's rich global contextual information to predict the pairwise order.
Analysis of the two proposed decoding strategies helps better explain error propagation in pairwise models.
arXiv Detail & Related papers (2021-10-08T17:57:59Z) - Learning Gaussian Graphical Models with Latent Confounders [74.72998362041088]
We compare and contrast two strategies for inference in graphical models with latent confounders.
While these two approaches have similar goals, they are motivated by different assumptions about confounding.
We propose a new method, which combines the strengths of these two approaches.
arXiv Detail & Related papers (2021-05-14T00:53:03Z) - Investigating Methods to Improve Language Model Integration for
Attention-based Encoder-Decoder ASR Models [107.86965028729517]
Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions.
We propose several novel methods to estimate the ILM directly from the AED model.
arXiv Detail & Related papers (2021-04-12T15:16:03Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.