A Disentangling Invertible Interpretation Network for Explaining Latent
Representations
- URL: http://arxiv.org/abs/2004.13166v1
- Date: Mon, 27 Apr 2020 20:43:20 GMT
- Title: A Disentangling Invertible Interpretation Network for Explaining Latent
Representations
- Authors: Patrick Esser, Robin Rombach, Bj\"orn Ommer
- Abstract summary: We formulate interpretation as a translation of hidden representations onto semantic concepts that are comprehensible to the user.
The proposed invertible interpretation network can be transparently applied on top of existing architectures.
We present an efficient approach to define semantic concepts by only sketching two images and also an unsupervised strategy.
- Score: 19.398202091883366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks have greatly boosted performance in computer vision by
learning powerful representations of input data. The drawback of end-to-end
training for maximal overall performance are black-box models whose hidden
representations are lacking interpretability: Since distributed coding is
optimal for latent layers to improve their robustness, attributing meaning to
parts of a hidden feature vector or to individual neurons is hindered. We
formulate interpretation as a translation of hidden representations onto
semantic concepts that are comprehensible to the user. The mapping between both
domains has to be bijective so that semantic modifications in the target domain
correctly alter the original representation. The proposed invertible
interpretation network can be transparently applied on top of existing
architectures with no need to modify or retrain them. Consequently, we
translate an original representation to an equivalent yet interpretable one and
backwards without affecting the expressiveness and performance of the original.
The invertible interpretation network disentangles the hidden representation
into separate, semantically meaningful concepts. Moreover, we present an
efficient approach to define semantic concepts by only sketching two images and
also an unsupervised strategy. Experimental evaluation demonstrates the wide
applicability to interpretation of existing classification and image generation
networks as well as to semantically guided image manipulation.
Related papers
- Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z) - Identifying Interpretable Subspaces in Image Representations [54.821222487956355]
We propose a framework to explain features of image representations using Contrasting Concepts (FALCON)
For a target feature, FALCON captions its highly activating cropped images using a large captioning dataset and a pre-trained vision-language model like CLIP.
Each word among the captions is scored and ranked leading to a small number of shared, human-understandable concepts.
arXiv Detail & Related papers (2023-07-20T00:02:24Z) - DProtoNet: Decoupling the inference module and the explanation module
enables neural networks to have better accuracy and interpretability [5.333582981327497]
In the previous method, by modifying the architecture of the neural network, the network simulates the human reasoning process.
We propose DProtoNet (Decoupling Prototypical network), it stores the decision basis of the neural network by using feature masks.
It decouples the neural network inference module from the interpretation module, and removes the specific architectural limitations of the interpretable network.
arXiv Detail & Related papers (2022-10-15T17:05:55Z) - Adversarially robust segmentation models learn perceptually-aligned
gradients [0.0]
We show that adversarially-trained semantic segmentation networks can be used to perform image inpainting and generation.
We argue that perceptually-aligned gradients promote a better understanding of a neural network's learned representations and aid in making neural networks more interpretable.
arXiv Detail & Related papers (2022-04-03T16:04:52Z) - Fair Interpretable Representation Learning with Correction Vectors [60.0806628713968]
We propose a new framework for fair representation learning that is centered around the learning of "correction vectors"
We show experimentally that several fair representation learning models constrained in such a way do not exhibit losses in ranking or classification performance.
arXiv Detail & Related papers (2022-02-07T11:19:23Z) - Fair Interpretable Learning via Correction Vectors [68.29997072804537]
We propose a new framework for fair representation learning centered around the learning of "correction vectors"
The corrections are then simply summed up to the original features, and can therefore be analyzed as an explicit penalty or bonus to each feature.
We show experimentally that a fair representation learning problem constrained in such a way does not impact performance.
arXiv Detail & Related papers (2022-01-17T10:59:33Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Contextual Semantic Interpretability [16.18912769522768]
We look into semantic bottlenecks that capture context.
We use a two-layer semantic bottleneck that gathers attributes into interpretable, sparse groups.
Our model yields in predictions as accurate as a non-interpretable baseline when applied to a real-world test set of Flickr images.
arXiv Detail & Related papers (2020-09-18T09:47:05Z) - Making Sense of CNNs: Interpreting Deep Representations & Their
Invariances with INNs [19.398202091883366]
We present an approach based on INNs that (i) recovers the task-specific, learned invariances by disentangling the remaining factor of variation in the data and that (ii) invertibly transforms these invariances combined with the model representation into an equally expressive one with accessible semantic concepts.
Our invertible approach significantly extends the abilities to understand black box models by enabling post-hoc interpretations of state-of-the-art networks without compromising their performance.
arXiv Detail & Related papers (2020-08-04T19:27:46Z) - Domain-aware Visual Bias Eliminating for Generalized Zero-Shot Learning [150.42959029611657]
Domain-aware Visual Bias Eliminating (DVBE) network constructs two complementary visual representations.
For unseen images, we automatically search an optimal semantic-visual alignment architecture.
arXiv Detail & Related papers (2020-03-30T08:17:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.