Aligned explanations in neural networks
- URL: http://arxiv.org/abs/2601.04378v1
- Date: Wed, 07 Jan 2026 20:35:02 GMT
- Title: Aligned explanations in neural networks
- Authors: Corentin Lobet, Francesca Chiaromonte,
- Abstract summary: We argue that explanations must be directly linked to predictions, rather than serving as post-hoc rationalizations.<n>We present model readability as a design principle enabling alignment, and PiNets as a modeling framework to pursue it in a deep learning context.
- Score: 0.8594140167290095
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Feature attribution is the dominant paradigm for explaining deep neural networks. However, most existing methods only loosely reflect the model's prediction-making process, thereby merely white-painting the black box. We argue that explanatory alignment is a key aspect of trustworthiness in prediction tasks: explanations must be directly linked to predictions, rather than serving as post-hoc rationalizations. We present model readability as a design principle enabling alignment, and PiNets as a modeling framework to pursue it in a deep learning context. PiNets are pseudo-linear networks that produce instance-wise linear predictions in an arbitrary feature space, making them linearly readable. We illustrate their use on image classification and segmentation tasks, demonstrating how PiNets produce explanations that are faithful across multiple criteria in addition to alignment.
Related papers
- Obtaining Example-Based Explanations from Deep Neural Networks [18.708235771482205]
EBE-DNN can provide highly concentrated example attributions, i.e., the predictions can be explained with few training examples.<n>The choice of layer to use for the embeddings may have a large impact on the resulting accuracy.
arXiv Detail & Related papers (2025-02-27T05:10:48Z) - Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Seeing in Words: Learning to Classify through Language Bottlenecks [59.97827889540685]
Humans can explain their predictions using succinct and intuitive descriptions.
We show that a vision model whose feature representations are text can effectively classify ImageNet images.
arXiv Detail & Related papers (2023-06-29T00:24:42Z) - The Contextual Lasso: Sparse Linear Models via Deep Neural Networks [5.607237982617641]
We develop a new statistical estimator that fits a sparse linear model to the explanatory features such that the sparsity pattern and coefficients vary as a function of the contextual features.
An extensive suite of experiments on real and synthetic data suggests that the learned models, which remain highly transparent, can be sparser than the regular lasso.
arXiv Detail & Related papers (2023-02-02T05:00:29Z) - Explaining Deep Convolutional Neural Networks for Image Classification by Evolving Local Interpretable Model-agnostic Explanations [8.669319624657701]
The proposed method is model-agnostic, i.e., it can be utilised to explain any deep convolutional neural network models.<n>The evolved local explanations on four images, randomly selected from ImageNet, are presented.<n>The proposed method can obtain local explanations within one minute, which is more than ten times faster than LIME.
arXiv Detail & Related papers (2022-11-28T08:56:00Z) - Towards Prototype-Based Self-Explainable Graph Neural Network [37.90997236795843]
We study a novel problem of learning prototype-based self-explainable GNNs that can simultaneously give accurate predictions and prototype-based explanations on predictions.
The learned prototypes are also used to simultaneously make prediction for for a test instance and provide instance-level explanation.
arXiv Detail & Related papers (2022-10-05T00:47:42Z) - Reinforced Causal Explainer for Graph Neural Networks [112.57265240212001]
Explainability is crucial for probing graph neural networks (GNNs)
We propose a reinforcement learning agent, Reinforced Causal Explainer (RC-Explainer)
RC-Explainer generates faithful and concise explanations, and has a better power to unseen graphs.
arXiv Detail & Related papers (2022-04-23T09:13:25Z) - Instance-Based Neural Dependency Parsing [56.63500180843504]
We develop neural models that possess an interpretable inference process for dependency parsing.
Our models adopt instance-based inference, where dependency edges are extracted and labeled by comparing them to edges in a training set.
arXiv Detail & Related papers (2021-09-28T05:30:52Z) - Sparse Oblique Decision Trees: A Tool to Understand and Manipulate
Neural Net Features [3.222802562733787]
We focus on understanding which of the internal features computed by the neural net are responsible for a particular class.
We show we can easily manipulate the neural net features in order to make the net predict, or not predict, a given class, thus showing that it is possible to carry out adversarial attacks at the level of the features.
arXiv Detail & Related papers (2021-04-07T05:31:08Z) - MACE: Model Agnostic Concept Extractor for Explaining Image
Classification Networks [10.06397994266945]
We propose MACE: a Model Agnostic Concept Extractor, which can explain the working of a convolutional network through smaller concepts.
We validate our framework using VGG16 and ResNet50 CNN architectures, and on datasets like Animals With Attributes 2 (AWA2) and Places365.
arXiv Detail & Related papers (2020-11-03T04:40:49Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Interpreting Graph Neural Networks for NLP With Differentiable Edge
Masking [63.49779304362376]
Graph neural networks (GNNs) have become a popular approach to integrating structural inductive biases into NLP models.
We introduce a post-hoc method for interpreting the predictions of GNNs which identifies unnecessary edges.
We show that we can drop a large proportion of edges without deteriorating the performance of the model.
arXiv Detail & Related papers (2020-10-01T17:51:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.