Transferring Visual Explainability of Self-Explaining Models through Task Arithmetic
- URL: http://arxiv.org/abs/2507.04380v1
- Date: Sun, 06 Jul 2025 12:55:31 GMT
- Title: Transferring Visual Explainability of Self-Explaining Models through Task Arithmetic
- Authors: Yuya Yoshikawa, Ryotaro Shimizu, Takahiro Kawashima, Yuki Saito,
- Abstract summary: Self-explaining models that perform both tasks in a single inference are effective, but their training incurs substantial labeling and computational costs.<n>This study proposes a method to transfer the visual explainability of self-explaining models, learned in a source domain, to a target domain based on a task arithmetic framework.
- Score: 8.918419734720613
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In scenarios requiring both prediction and explanation efficiency for image classification, self-explaining models that perform both tasks in a single inference are effective. However, their training incurs substantial labeling and computational costs. This study aims to tackle the issue by proposing a method to transfer the visual explainability of self-explaining models, learned in a source domain, to a target domain based on a task arithmetic framework. Specifically, we construct a self-explaining model by extending image classifiers based on a vision-language pretrained model. We then define an \emph{explainability vector} as the difference between model parameters trained on the source domain with and without explanation supervision. Based on the task arithmetic framework, we impart explainability to a model trained only on the prediction task in the target domain by applying the explainability vector. Experimental results on various image classification datasets demonstrate that, except for transfers between some less-related domains, visual explainability can be successfully transferred from source to target domains, improving explanation quality in the target domain without sacrificing classification accuracy. Furthermore, we show that the explainability vector learned on a large and diverse dataset like ImageNet, extended with explanation supervision, exhibits universality and robustness, improving explanation quality on nine out of ten different target datasets. We also find that the explanation quality achieved with a single model inference is comparable to that of Kernel SHAP, which requires 150 model inferences.
Related papers
- Towards a Unified Representation Evaluation Framework Beyond Downstream Tasks [9.390951257874187]
We introduce a standardized protocol to quantify informativeness, equivariance, invariance, and disentanglement of factors of variation in model representations.<n>We find that representations from models with similar downstream performance can behave substantially differently with regard to these attributes.
arXiv Detail & Related papers (2025-05-09T17:58:52Z) - From Visual Explanations to Counterfactual Explanations with Latent Diffusion [11.433402357922414]
We propose a new approach to tackle two key challenges in recent prominent works.<n>First, we determine which specific counterfactual features are crucial for distinguishing the "concept" of the target class from the original class.<n>Second, we provide valuable explanations for the non-robust classifier without relying on the support of an adversarially robust model.
arXiv Detail & Related papers (2025-04-12T13:04:00Z) - How to Probe: Simple Yet Effective Techniques for Improving Post-hoc Explanations [69.72654127617058]
Post-hoc importance attribution methods are a popular tool for "explaining" Deep Neural Networks (DNNs)<n>In this work we bring forward empirical evidence that challenges this very notion.<n>We discover a strong dependency on and demonstrate that the training details of a pre-trained model's classification layer play a crucial role.
arXiv Detail & Related papers (2025-03-01T22:25:11Z) - Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z) - CNN-based explanation ensembling for dataset, representation and explanations evaluation [1.1060425537315088]
We explore the potential of ensembling explanations generated by deep classification models using convolutional model.
Through experimentation and analysis, we aim to investigate the implications of combining explanations to uncover a more coherent and reliable patterns of the model's behavior.
arXiv Detail & Related papers (2024-04-16T08:39:29Z) - A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels.
We present a generative latent variable model for self-supervised learning.
We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z) - Explainers in the Wild: Making Surrogate Explainers Robust to
Distortions through Perception [77.34726150561087]
We propose a methodology to evaluate the effect of distortions in explanations by embedding perceptual distances.
We generate explanations for images in the Imagenet-C dataset and demonstrate how using a perceptual distances in the surrogate explainer creates more coherent explanations for the distorted and reference images.
arXiv Detail & Related papers (2021-02-22T12:38:53Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z) - Explainable Deep Classification Models for Domain Generalization [94.43131722655617]
Explanations are defined as regions of visual evidence upon which a deep classification network makes a decision.
Our training strategy enforces a periodic saliency-based feedback to encourage the model to focus on the image regions that directly correspond to the ground-truth object.
arXiv Detail & Related papers (2020-03-13T22:22:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.