Contrastive Corpus Attribution for Explaining Representations
- URL: http://arxiv.org/abs/2210.00107v2
- Date: Mon, 12 Jun 2023 22:23:56 GMT
- Title: Contrastive Corpus Attribution for Explaining Representations
- Authors: Chris Lin, Hugh Chen, Chanwoo Kim, Su-In Lee
- Abstract summary: Most explanation methods explain a scalar model output.
Recent works defined a scalar explanation output: a dot product-based similarity in the representation space to the sample being explained.
We propose contrastive corpus similarity, a novel and semantically meaningful scalar explanation output.
- Score: 17.07084455770185
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the widespread use of unsupervised models, very few methods are
designed to explain them. Most explanation methods explain a scalar model
output. However, unsupervised models output representation vectors, the
elements of which are not good candidates to explain because they lack semantic
meaning. To bridge this gap, recent works defined a scalar explanation output:
a dot product-based similarity in the representation space to the sample being
explained (i.e., an explicand). Although this enabled explanations of
unsupervised models, the interpretation of this approach can still be opaque
because similarity to the explicand's representation may not be meaningful to
humans. To address this, we propose contrastive corpus similarity, a novel and
semantically meaningful scalar explanation output based on a reference corpus
and a contrasting foil set of samples. We demonstrate that contrastive corpus
similarity is compatible with many post-hoc feature attribution methods to
generate COntrastive COrpus Attributions (COCOA) and quantitatively verify that
features important to the corpus are identified. We showcase the utility of
COCOA in two ways: (i) we draw insights by explaining augmentations of the same
image in a contrastive learning setting (SimCLR); and (ii) we perform zero-shot
object localization by explaining the similarity of image representations to
jointly learned text representations (CLIP).
Related papers
- Conjuring Semantic Similarity [59.18714889874088]
The semantic similarity between two textual expressions measures the distance between their latent'meaning'
We propose a novel approach whereby the semantic similarity among textual expressions is based not on other expressions they can be rephrased as, but rather based on the imagery they evoke.
Our method contributes a novel perspective on semantic similarity that not only aligns with human-annotated scores, but also opens up new avenues for the evaluation of text-conditioned generative models.
arXiv Detail & Related papers (2024-10-21T18:51:34Z) - Interpreting Vision and Language Generative Models with Semantic Visual
Priors [3.3772986620114374]
We develop a framework based on SHAP that allows for generating meaningful explanations leveraging the meaning representation of the output sequence as a whole.
We demonstrate that our method generates semantically more expressive explanations than traditional methods at a lower compute cost.
arXiv Detail & Related papers (2023-04-28T17:10:08Z) - What Are You Token About? Dense Retrieval as Distributions Over the
Vocabulary [68.77983831618685]
We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space.
We show that the resulting projections contain rich semantic information, and draw connection between them and sparse retrieval.
arXiv Detail & Related papers (2022-12-20T16:03:25Z) - Argumentative Explanations for Pattern-Based Text Classifiers [15.81939090849456]
We focus on explanations for a specific interpretable model, namely pattern-based logistic regression (PLR) for binary text classification.
We propose AXPLR, a novel explanation method using (forms of) computational argumentation to generate explanations.
arXiv Detail & Related papers (2022-05-22T21:16:49Z) - Explaining Latent Representations with a Corpus of Examples [72.50996504722293]
We propose SimplEx: a user-centred method that provides example-based explanations with reference to a freely selected set of examples.
SimplEx uses the corpus to improve the user's understanding of the latent space with post-hoc explanations.
We show that SimplEx empowers the user by highlighting relevant patterns in the corpus that explain model representations.
arXiv Detail & Related papers (2021-10-28T17:59:06Z) - Self-Supervised Learning Disentangled Group Representation as Feature [82.07737719232972]
We show that existing Self-Supervised Learning (SSL) only disentangles simple augmentation features such as rotation and colorization.
We propose an iterative SSL algorithm: Iterative Partition-based Invariant Risk Minimization (IP-IRM)
We prove that IP-IRM converges to a fully disentangled representation and show its effectiveness on various benchmarks.
arXiv Detail & Related papers (2021-10-28T16:12:33Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z) - Explainers in the Wild: Making Surrogate Explainers Robust to
Distortions through Perception [77.34726150561087]
We propose a methodology to evaluate the effect of distortions in explanations by embedding perceptual distances.
We generate explanations for images in the Imagenet-C dataset and demonstrate how using a perceptual distances in the surrogate explainer creates more coherent explanations for the distorted and reference images.
arXiv Detail & Related papers (2021-02-22T12:38:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.