Related papers: Posthoc Interpretation via Quantization

Posthoc Interpretation via Quantization

URL: http://arxiv.org/abs/2303.12659v2
Date: Sat, 27 May 2023 12:26:23 GMT
Title: Posthoc Interpretation via Quantization
Authors: Francesco Paissan, Cem Subakan, Mirco Ravanelli
Abstract summary: We introduce a new approach, called Posthoc Interpretation via Quantization (PIQ), for interpreting decisions made by trained classifiers. Our method utilizes vector quantization to transform the representations of a classifier into a discrete, class-specific latent space. Our model formulation also enables learning concepts by incorporating the supervision of pretrained annotation models.
Score: 9.510336895838703
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we introduce a new approach, called Posthoc Interpretation via Quantization (PIQ), for interpreting decisions made by trained classifiers. Our method utilizes vector quantization to transform the representations of a classifier into a discrete, class-specific latent space. The class-specific codebooks act as a bottleneck that forces the interpreter to focus on the parts of the input data deemed relevant by the classifier for making a prediction. Our model formulation also enables learning concepts by incorporating the supervision of pretrained annotation models such as state-of-the-art image segmentation models. We evaluated our method through quantitative and qualitative studies involving black-and-white images, color images, and audio. As a result of these studies we found that PIQ generates interpretations that are more easily understood by participants to our user studies when compared to several other interpretation methods in the literature.

Related papers

Interpretable Image Classification via Non-parametric Part Prototype Learning [14.390730075612248]
Classifying images with an interpretable decision-making process is a long-standing problem in computer vision. In recent years, Prototypical Part Networks has gained traction as an approach for self-explainable neural networks. We present a framework for part-based interpretable image classification that learns a set of semantically distinctive object parts for each class.
arXiv Detail & Related papers (2025-03-13T10:46:53Z)
COMIX: Compositional Explanations using Prototypes [46.15031477955461]
We propose a method to align machine representations with human understanding. The proposed method, named COMIX, classifies an image by decomposing it into regions based on learned concepts. We show that our method provides fidelity of explanations and shows that the efficiency is competitive with other inherently interpretable architectures.
arXiv Detail & Related papers (2025-01-10T15:40:31Z)
LatentQA: Teaching LLMs to Decode Activations Into Natural Language [72.87064562349742]
We introduce LatentQA, the task of answering open-ended questions about model activations in natural language. We propose Latent Interpretation Tuning (LIT), which finetunes a decoder LLM on a dataset of activations and associated question-answer pairs. Our decoder also specifies a differentiable loss that we use to control models, such as debiasing models on stereotyped sentences and controlling the sentiment of generations.
arXiv Detail & Related papers (2024-12-11T18:59:33Z)
Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification [5.087579454836169]
State-of-the-art explainability methods generate saliency maps to show where a specific class is identified. We introduce a post-hoc method that explains the entire feature extraction process of a Convolutional Neural Network. We also show an approach to generate global explanations by aggregating labels across multiple images.
arXiv Detail & Related papers (2024-05-06T09:21:35Z)
Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective [68.20531518525273]
We take a closer look into existing self-supervised methods of speech from an information-theoretic perspective. We use linear probes to estimate the mutual information between the target information and learned representations. We explore the potential of evaluating representations in a self-supervised fashion, where we estimate the mutual information between different parts of the data without using any labels.
arXiv Detail & Related papers (2024-01-16T21:13:22Z)
TExplain: Explaining Learned Visual Features via Pre-trained (Frozen) Language Models [14.019349267520541]
We propose a novel method that leverages the capabilities of language models to interpret the learned features of pre-trained image classifiers. Our approach generates a vast number of sentences to explain the features learned by the classifier for a given image. Our method, for the first time, utilizes these frequent words corresponding to a visual representation to provide insights into the decision-making process.
arXiv Detail & Related papers (2023-09-01T20:59:46Z)
A Test Statistic Estimation-based Approach for Establishing Self-interpretable CNN-based Binary Classifiers [7.424003880270276]
Post-hoc interpretability methods have the limitation that they can produce plausible but different interpretations. The proposed method is self-interpretable, quantitative. Unlike the traditional post-hoc interpretability methods, the proposed method is self-interpretable, quantitative.
arXiv Detail & Related papers (2023-03-13T05:51:35Z)
Measuring the Interpretability of Unsupervised Representations via Quantized Reverse Probing [97.70862116338554]
We investigate the problem of measuring interpretability of self-supervised representations. We formulate the latter as estimating the mutual information between the representation and a space of manually labelled concepts. We use our method to evaluate a large number of self-supervised representations, ranking them by interpretability.
arXiv Detail & Related papers (2022-09-07T16:18:50Z)
A Unified Understanding of Deep NLP Models for Text Classification [88.35418976241057]
We have developed a visual analysis tool, DeepNLPVis, to enable a unified understanding of NLP models for text classification. The key idea is a mutual information-based measure, which provides quantitative explanations on how each layer of a model maintains the information of input words in a sample. A multi-level visualization, which consists of a corpus-level, a sample-level, and a word-level visualization, supports the analysis from the overall training set to individual samples.
arXiv Detail & Related papers (2022-06-19T08:55:07Z)
Autoregressive Co-Training for Learning Discrete Speech Representations [19.400428010647573]
We consider a generative model with discrete latent variables that learns a discrete representation for speech. We find that the proposed approach learns discrete representation that is highly correlated with phonetic units.
arXiv Detail & Related papers (2022-03-29T18:17:18Z)
Resolving label uncertainty with implicit posterior models [71.62113762278963]
We propose a method for jointly inferring labels across a collection of data samples. By implicitly assuming the existence of a generative model for which a differentiable predictor is the posterior, we derive a training objective that allows learning under weak beliefs.
arXiv Detail & Related papers (2022-02-28T18:09:44Z)
Fair Interpretable Representation Learning with Correction Vectors [60.0806628713968]
We propose a new framework for fair representation learning that is centered around the learning of "correction vectors" We show experimentally that several fair representation learning models constrained in such a way do not exhibit losses in ranking or classification performance.
arXiv Detail & Related papers (2022-02-07T11:19:23Z)
Instance-Based Learning of Span Representations: A Case Study through Named Entity Recognition [48.06319154279427]
We present a method of instance-based learning that learns similarities between spans. Our method enables to build models that have high interpretability without sacrificing performance.
arXiv Detail & Related papers (2020-04-29T23:32:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.