Latent Space Explanation by Intervention
- URL: http://arxiv.org/abs/2112.04895v1
- Date: Thu, 9 Dec 2021 13:23:19 GMT
- Title: Latent Space Explanation by Intervention
- Authors: Itai Gat, Guy Lorberbom, Idan Schwartz, Tamir Hazan
- Abstract summary: This study aims to reveal hidden concepts by employing an intervention mechanism that shifts the predicted class based on discrete variational autoencoders.
An explanatory model then visualizes encoded information from any hidden layer and its corresponding intervened representation.
- Score: 16.43087660376697
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The success of deep neural nets heavily relies on their ability to encode
complex relations between their input and their output. While this property
serves to fit the training data well, it also obscures the mechanism that
drives prediction. This study aims to reveal hidden concepts by employing an
intervention mechanism that shifts the predicted class based on discrete
variational autoencoders. An explanatory model then visualizes the encoded
information from any hidden layer and its corresponding intervened
representation. By the assessment of differences between the original
representation and the intervened representation, one can determine the
concepts that can alter the class, hence providing interpretability. We
demonstrate the effectiveness of our approach on CelebA, where we show various
visualizations for bias in the data and suggest different interventions to
reveal and change bias.
Related papers
- Robust Domain Generalisation with Causal Invariant Bayesian Neural Networks [9.999199798941424]
We propose a Bayesian neural architecture that disentangles the learning of the the data distribution from the inference process mechanisms.
We show theoretically and experimentally that our model approximates reasoning under causal interventions.
arXiv Detail & Related papers (2024-10-08T20:38:05Z) - Intervention Lens: from Representation Surgery to String Counterfactuals [106.98481791980367]
Interventions targeting the representation space of language models (LMs) have emerged as an effective means to influence model behavior.
We give a method to convert representation counterfactuals into string counterfactuals.
The resulting counterfactuals can be used to mitigate bias in classification through data augmentation.
arXiv Detail & Related papers (2024-02-17T18:12:02Z) - Understanding Distributed Representations of Concepts in Deep Neural
Networks without Supervision [25.449397570387802]
We propose an unsupervised method for discovering distributed representations of concepts by selecting a principal subset of neurons.
Our empirical findings demonstrate that instances with similar neuron activation states tend to share coherent concepts.
It can be utilized to identify unlabeled subclasses within data and to detect the causes of misclassifications.
arXiv Detail & Related papers (2023-12-28T07:33:51Z) - Unveiling the Potential of Probabilistic Embeddings in Self-Supervised
Learning [4.124934010794795]
Self-supervised learning has played a pivotal role in advancing machine learning by allowing models to acquire meaningful representations from unlabeled data.
We investigate the impact of probabilistic modeling on the information bottleneck, shedding light on a trade-off between compression and preservation of information in both representation and loss space.
Our findings suggest that introducing an additional bottleneck in the loss space can significantly enhance the ability to detect out-of-distribution examples.
arXiv Detail & Related papers (2023-10-27T12:01:16Z) - Hybrid Predictive Coding: Inferring, Fast and Slow [62.997667081978825]
We propose a hybrid predictive coding network that combines both iterative and amortized inference in a principled manner.
We demonstrate that our model is inherently sensitive to its uncertainty and adaptively balances balances to obtain accurate beliefs using minimum computational expense.
arXiv Detail & Related papers (2022-04-05T12:52:45Z) - Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations.
We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning.
It aims to extract both the common information and the complementary information in an adversarial setting.
In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z) - Learning Disentangled Representations with Latent Variation
Predictability [102.4163768995288]
This paper defines the variation predictability of latent disentangled representations.
Within an adversarial generation process, we encourage variation predictability by maximizing the mutual information between latent variations and corresponding image pairs.
We develop an evaluation metric that does not rely on the ground-truth generative factors to measure the disentanglement of latent representations.
arXiv Detail & Related papers (2020-07-25T08:54:26Z) - How does this interaction affect me? Interpretable attribution for
feature interactions [19.979889568380464]
We propose an interaction attribution and detection framework called Archipelago.
Our experiments on standard annotation labels indicate our approach provides significantly more interpretable explanations than comparable methods.
We also provide accompanying visualizations of our approach that give new insights into deep neural networks.
arXiv Detail & Related papers (2020-06-19T05:14:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.