Related papers: Activation Matching for Explanation Generation

Activation Matching for Explanation Generation

URL: http://arxiv.org/abs/2509.23051v1
Date: Sat, 27 Sep 2025 02:12:09 GMT
Title: Activation Matching for Explanation Generation
Authors: Pirzada Suhail, Aditya Anand, Amit Sethi,
Abstract summary: We generate minimal, faithful explanations for the decision-making of a pretrained classifier on any given image.<n>We train a lightweight autoencoder to output a binary mask (m) such that the explanation (e = m odot x) preserves both the model's prediction and the intermediate activations of (x)<n>Our objective combines: (i) multi-layer activation matching with KL divergence to align distributions and cross-entropy to retain the top-1 label for both the image and the explanation.
Score: 10.850989126934317
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper we introduce an activation-matching--based approach to generate minimal, faithful explanations for the decision-making of a pretrained classifier on any given image. Given an input image \(x\) and a frozen model \(f\), we train a lightweight autoencoder to output a binary mask \(m\) such that the explanation \(e = m \odot x\) preserves both the model's prediction and the intermediate activations of \(x\). Our objective combines: (i) multi-layer activation matching with KL divergence to align distributions and cross-entropy to retain the top-1 label for both the image and the explanation; (ii) mask priors -- L1 area for minimality, a binarization penalty for crisp 0/1 masks, and total variation for compactness; and (iii) abductive constraints for faithfulness and necessity. Together, these objectives yield small, human-interpretable masks that retain classifier behavior while discarding irrelevant input regions, providing practical and faithful minimalist explanations for the decision making of the underlying model.

Related papers

Minimalist Explanation Generation and Circuit Discovery [10.850989126934317]
In this paper, we introduce an activation-matching based approach to generate minimal explanations for machine learning decisions.<n>We train a lightweight autoencoder to produce binary masks that learn to highlight the decision-wise critical regions of an image.<n>The minimal explanations so generated also lead us to mechanistically interpreting the model internals.
arXiv Detail & Related papers (2025-09-30T02:43:44Z)
SeeDiff: Off-the-Shelf Seeded Mask Generation from Diffusion Models [16.109077391631917]
We show that cross-attention alone provides very coarse object localization, which however can provide initial seeds.<n>We also observe that a simple-text-guided synthetic image often has a uniform background, which is easier to find correspondences.<n>Our proposed method, dubbed SeeDiff, generates high-quality masks off-the-shelf from Stable Diffusion.
arXiv Detail & Related papers (2025-07-26T05:44:00Z)
Unsupervised Segmentation by Diffusing, Walking and Cutting [5.6872893893453105]
We propose an unsupervised image segmentation method using features from pre-trained text-to-image diffusion models.<n>A key insight is that self-attention probability distributions can be interpreted as a transition matrix for random walks across the image.<n>We show that our approach surpasses all existing methods for zero-shot unsupervised segmentation, achieving state-of-the-art results on COCO-Stuff-27 and Cityscapes.
arXiv Detail & Related papers (2024-12-06T00:23:18Z)
SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks [19.58547231973585]
We propose a flexible objective termed SparsePO to automatically learn to weight the KL divergence and reward corresponding to each token during PO training.<n>Our method obtains +10% and +3% win rate points in summarization and dialogue scenarios.
arXiv Detail & Related papers (2024-10-07T15:01:29Z)
MaskInversion: Localized Embeddings via Optimization of Explainability Maps [49.50785637749757]
MaskInversion generates a context-aware embedding for a query image region specified by a mask at test time. It can be used for a broad range of tasks, including open-vocabulary class retrieval, referring expression comprehension, as well as for localized captioning and image generation.
arXiv Detail & Related papers (2024-07-29T14:21:07Z)
Masked Pre-training Enables Universal Zero-shot Denoiser [12.753764967728973]
We propose a novel zero-shot denoising paradigm, i.e., Masked Pre-train then Iterative fill (MPI) MPI first trains model via masking and then employs pre-trained weight for high-quality zero-shot image denoising on a single noisy image.
arXiv Detail & Related papers (2024-01-26T15:58:57Z)
Variance-insensitive and Target-preserving Mask Refinement for Interactive Image Segmentation [68.16510297109872]
Point-based interactive image segmentation can ease the burden of mask annotation in applications such as semantic segmentation and image editing. We introduce a novel method, Variance-Insensitive and Target-Preserving Mask Refinement to enhance segmentation quality with fewer user inputs. Experiments on GrabCut, Berkeley, SBD, and DAVIS datasets demonstrate our method's state-of-the-art performance in interactive image segmentation.
arXiv Detail & Related papers (2023-12-22T02:31:31Z)
An Explainable Model-Agnostic Algorithm for CNN-based Biometrics Verification [55.28171619580959]
This paper describes an adaptation of the Local Interpretable Model-Agnostic Explanations (LIME) AI method to operate under a biometric verification setting.
arXiv Detail & Related papers (2023-07-25T11:51:14Z)
DFormer: Diffusion-guided Transformer for Universal Image Segmentation [86.73405604947459]
The proposed DFormer views universal image segmentation task as a denoising process using a diffusion model. At inference, our DFormer directly predicts the masks and corresponding categories from a set of randomly-generated masks. Our DFormer outperforms the recent diffusion-based panoptic segmentation method Pix2Seq-D with a gain of 3.6% on MS COCO val 2017 set.
arXiv Detail & Related papers (2023-06-06T06:33:32Z)
Image as First-Order Norm+Linear Autoregression: Unveiling Mathematical Invariance [104.05734286732941]
FINOLA represents each image in the latent space as a first-order autoregressive process. We demonstrate the ability of FINOLA to auto-regress up to a 256x256 feature map. We also leverage FINOLA for self-supervised learning by employing a simple masked prediction approach.
arXiv Detail & Related papers (2023-05-25T17:59:50Z)
What You See is What You Classify: Black Box Attributions [61.998683569022006]
We train a deep network, the Explainer, to predict attributions for a pre-trained black-box classifier, the Explanandum. Unlike most existing approaches, ours is capable of directly generating very distinct class-specific masks. We show that our attributions are superior to established methods both visually and quantitatively.
arXiv Detail & Related papers (2022-05-23T12:30:04Z)
Threshold Matters in WSSS: Manipulating the Activation for the Robust and Accurate Segmentation Model Against Thresholds [16.6833745997519]
Weakly-supervised semantic segmentation (WSSS) has recently gained much attention for its promise to train segmentation models only with image-level labels. Existing WSSS methods commonly argue that the sparse coverage of CAM incurs the performance bottleneck of WSSS. This paper provides analytical and empirical evidence that the actual bottleneck may not be sparse coverage but a global thresholding scheme applied after CAM.
arXiv Detail & Related papers (2022-03-30T04:26:14Z)
Investigating and Simplifying Masking-based Saliency Methods for Model Interpretability [5.387323728379395]
Saliency maps that identify the most informative regions of an image are valuable for model interpretability. A common approach to creating saliency maps involves generating input masks that mask out portions of an image. We show that a masking model can be trained with as few as 10 examples per class and still generate saliency maps with only a 0.7-point increase in localization error.
arXiv Detail & Related papers (2020-10-19T18:00:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.