From Attribution Maps to Human-Understandable Explanations through
Concept Relevance Propagation
- URL: http://arxiv.org/abs/2206.03208v2
- Date: Sat, 6 Jan 2024 16:04:47 GMT
- Title: From Attribution Maps to Human-Understandable Explanations through
Concept Relevance Propagation
- Authors: Reduan Achtibat, Maximilian Dreyer, Ilona Eisenbraun, Sebastian Bosse,
Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin
- Abstract summary: The field of eXplainable Artificial Intelligence (XAI) aims to bring transparency to today's powerful but opaque deep learning models.
While local XAI methods explain individual predictions in form of attribution maps, global explanation techniques visualize what concepts a model has generally learned to encode.
- Score: 16.783836191022445
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The field of eXplainable Artificial Intelligence (XAI) aims to bring
transparency to today's powerful but opaque deep learning models. While local
XAI methods explain individual predictions in form of attribution maps, thereby
identifying where important features occur (but not providing information about
what they represent), global explanation techniques visualize what concepts a
model has generally learned to encode. Both types of methods thus only provide
partial insights and leave the burden of interpreting the model's reasoning to
the user. In this work we introduce the Concept Relevance Propagation (CRP)
approach, which combines the local and global perspectives and thus allows
answering both the "where" and "what" questions for individual predictions. We
demonstrate the capability of our method in various settings, showcasing that
CRP leads to more human interpretable explanations and provides deep insights
into the model's representation and reasoning through concept atlases, concept
composition analyses, and quantitative investigations of concept subspaces and
their role in fine-grained decision making.
Related papers
- A survey on Concept-based Approaches For Model Improvement [2.1516043775965565]
Concepts are known to be the thinking ground of humans.
We provide a systematic review and taxonomy of various concept representations and their discovery algorithms in Deep Neural Networks (DNNs)
We also provide details on concept-based model improvement literature marking the first comprehensive survey of these methods.
arXiv Detail & Related papers (2024-03-21T17:09:20Z) - An Axiomatic Approach to Model-Agnostic Concept Explanations [67.84000759813435]
We propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity.
We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings.
arXiv Detail & Related papers (2024-01-12T20:53:35Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - Uncovering Unique Concept Vectors through Latent Space Decomposition [0.0]
Concept-based explanations have emerged as a superior approach that is more interpretable than feature attribution estimates.
We propose a novel post-hoc unsupervised method that automatically uncovers the concepts learned by deep models during training.
Our experiments reveal that the majority of our concepts are readily understandable to humans, exhibit coherency, and bear relevance to the task at hand.
arXiv Detail & Related papers (2023-07-13T17:21:54Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - Mapping Knowledge Representations to Concepts: A Review and New
Perspectives [0.6875312133832078]
This review focuses on research that aims to associate internal representations with human understandable concepts.
We find this taxonomy and theories of causality, useful for understanding what can be expected, and not expected, from neural network explanations.
The analysis additionally uncovers an ambiguity in the reviewed literature related to the goal of model explainability.
arXiv Detail & Related papers (2022-12-31T12:56:12Z) - Concept-Based Explanations for Tabular Data [0.0]
We propose a concept-based explainability for Deep Neural Networks (DNNs)
We show the validity of our method in generating interpretability results that match the human-level intuitions.
We also propose a notion of fairness based on TCAV that quantifies what layer of DNN has learned representations that lead to biased predictions.
arXiv Detail & Related papers (2022-09-13T02:19:29Z) - Concept Gradient: Concept-based Interpretation Without Linear Assumption [77.96338722483226]
Concept Activation Vector (CAV) relies on learning a linear relation between some latent representation of a given model and concepts.
We proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions.
We demonstrated CG outperforms CAV in both toy examples and real world datasets.
arXiv Detail & Related papers (2022-08-31T17:06:46Z) - Human-Centered Concept Explanations for Neural Networks [47.71169918421306]
We introduce concept explanations including the class of Concept Activation Vectors (CAV)
We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats.
Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.
arXiv Detail & Related papers (2022-02-25T01:27:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.