Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach
- URL: http://arxiv.org/abs/2407.20899v2
- Date: Sat, 12 Oct 2024 10:25:02 GMT
- Title: Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach
- Authors: Adam Wojciechowski, Mateusz Lango, Ondrej Dusek,
- Abstract summary: This paper proposes a post-hoc natural language explanation method that can be applied to any CNN-based classification system.
By analysing influential neurons and the corresponding activation maps, the method generates a faithful description of the classifier's decision process.
Experimental results show that the NLEs constructed by our method are significantly more plausible and faithful.
- Score: 10.54430941755474
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing explanation methods for image classification struggle to provide faithful and plausible explanations. This paper addresses this issue by proposing a post-hoc natural language explanation method that can be applied to any CNN-based classifier without altering its training process or affecting predictive performance. By analysing influential neurons and the corresponding activation maps, the method generates a faithful description of the classifier's decision process in the form of a structured meaning representation, which is then converted into text by a language model. Through this pipeline approach, the generated explanations are grounded in the neural network architecture, providing accurate insight into the classification process while remaining accessible to non-experts. Experimental results show that the NLEs constructed by our method are significantly more plausible and faithful. In particular, user interventions in the neural network structure (masking of neurons) are three times more effective than the baselines.
Related papers
- Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification [5.087579454836169]
State-of-the-art explainability methods generate saliency maps to show where a specific class is identified.
We introduce a post-hoc method that explains the entire feature extraction process of a Convolutional Neural Network.
We also show an approach to generate global explanations by aggregating labels across multiple images.
arXiv Detail & Related papers (2024-05-06T09:21:35Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - TExplain: Explaining Learned Visual Features via Pre-trained (Frozen) Language Models [14.019349267520541]
We propose a novel method that leverages the capabilities of language models to interpret the learned features of pre-trained image classifiers.
Our approach generates a vast number of sentences to explain the features learned by the classifier for a given image.
Our method, for the first time, utilizes these frequent words corresponding to a visual representation to provide insights into the decision-making process.
arXiv Detail & Related papers (2023-09-01T20:59:46Z) - Seeing in Words: Learning to Classify through Language Bottlenecks [59.97827889540685]
Humans can explain their predictions using succinct and intuitive descriptions.
We show that a vision model whose feature representations are text can effectively classify ImageNet images.
arXiv Detail & Related papers (2023-06-29T00:24:42Z) - Adversarial Attacks on the Interpretation of Neuron Activation
Maximization [70.5472799454224]
Activation-maximization approaches are used to interpret and analyze trained deep-learning models.
In this work, we consider the concept of an adversary manipulating a model for the purpose of deceiving the interpretation.
arXiv Detail & Related papers (2023-06-12T19:54:33Z) - Robust Explanation Constraints for Neural Networks [33.14373978947437]
Post-hoc explanation methods used with the intent of neural networks are sometimes said to help engender trust in their outputs.
Our training method is the only method able to learn neural networks with insights about robustness tested across all six tested networks.
arXiv Detail & Related papers (2022-12-16T14:40:25Z) - Hierarchical Interpretation of Neural Text Classification [31.95426448656938]
This paper proposes a novel Hierarchical INTerpretable neural text classifier, called Hint, which can automatically generate explanations of model predictions.
Experimental results on both review datasets and news datasets show that our proposed approach achieves text classification results on par with existing state-of-the-art text classifiers.
arXiv Detail & Related papers (2022-02-20T11:15:03Z) - Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations.
We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z) - FF-NSL: Feed-Forward Neural-Symbolic Learner [70.978007919101]
This paper introduces a neural-symbolic learning framework, called Feed-Forward Neural-Symbolic Learner (FF-NSL)
FF-NSL integrates state-of-the-art ILP systems based on the Answer Set semantics, with neural networks, in order to learn interpretable hypotheses from labelled unstructured data.
arXiv Detail & Related papers (2021-06-24T15:38:34Z) - Generating Hierarchical Explanations on Text Classification via Feature
Interaction Detection [21.02924712220406]
We build hierarchical explanations by detecting feature interactions.
Such explanations visualize how words and phrases are combined at different levels of the hierarchy.
Experiments show the effectiveness of the proposed method in providing explanations both faithful to models and interpretable to humans.
arXiv Detail & Related papers (2020-04-04T20:56:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.