Enhancing Pre-trained Representation Classifiability can Boost its Interpretability
- URL: http://arxiv.org/abs/2510.24105v1
- Date: Tue, 28 Oct 2025 06:21:06 GMT
- Title: Enhancing Pre-trained Representation Classifiability can Boost its Interpretability
- Authors: Shufan Shen, Zhaobo Qi, Junshu Sun, Qingming Huang, Qi Tian, Shuhui Wang,
- Abstract summary: We quantify the representation interpretability by leveraging its correlation with the ratio of interpretable semantics within the representations.<n>We propose the Inherent Interpretability Score (IIS) that evaluates the information loss, measures the ratio of interpretable semantics, and quantifies the representation interpretability.
- Score: 112.296393156262
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The visual representation of a pre-trained model prioritizes the classifiability on downstream tasks, while the widespread applications for pre-trained visual models have posed new requirements for representation interpretability. However, it remains unclear whether the pre-trained representations can achieve high interpretability and classifiability simultaneously. To answer this question, we quantify the representation interpretability by leveraging its correlation with the ratio of interpretable semantics within the representations. Given the pre-trained representations, only the interpretable semantics can be captured by interpretations, whereas the uninterpretable part leads to information loss. Based on this fact, we propose the Inherent Interpretability Score (IIS) that evaluates the information loss, measures the ratio of interpretable semantics, and quantifies the representation interpretability. In the evaluation of the representation interpretability with different classifiability, we surprisingly discover that the interpretability and classifiability are positively correlated, i.e., representations with higher classifiability provide more interpretable semantics that can be captured in the interpretations. This observation further supports two benefits to the pre-trained representations. First, the classifiability of representations can be further improved by fine-tuning with interpretability maximization. Second, with the classifiability improvement for the representations, we obtain predictions based on their interpretations with less accuracy degradation. The discovered positive correlation and corresponding applications show that practitioners can unify the improvements in interpretability and classifiability for pre-trained vision models. Codes are available at https://github.com/ssfgunner/IIS.
Related papers
- Residualized Similarity for Faithfully Explainable Authorship Verification [18.7598434282115]
Residualized Similarity (RS) is a novel method that supplements systems using interpretable features with a neural network to improve their performance.<n>Our evaluation across four datasets shows that not only can we match the performance of state-of-the-art authorship verification models, but we can show how and to what degree the final prediction is faithful and interpretable.
arXiv Detail & Related papers (2025-10-06T20:41:17Z) - Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z) - Learning Interpretable Fair Representations [5.660855954377282]
We propose a framework for learning interpretable fair representations during the representation learning process.
In addition to being interpretable, our representations attain slightly higher accuracy and fairer outcomes in a downstream classification task.
arXiv Detail & Related papers (2024-06-24T15:01:05Z) - Intrinsic User-Centric Interpretability through Global Mixture of Experts [31.738009841932374]
InterpretCC is a family of intrinsically interpretable neural networks that optimize for ease of human understanding and explanation faithfulness.<n>We show that InterpretCC explanations are found to have higher actionability and usefulness over other intrinsically interpretable approaches.
arXiv Detail & Related papers (2024-02-05T11:55:50Z) - Disentangled Representation Learning with Transmitted Information Bottleneck [57.22757813140418]
We present textbfDisTIB (textbfTransmitted textbfInformation textbfBottleneck for textbfDisd representation learning), a novel objective that navigates the balance between information compression and preservation.
arXiv Detail & Related papers (2023-11-03T03:18:40Z) - Intermediate Entity-based Sparse Interpretable Representation Learning [37.128220450933625]
Interpretable entity representations (IERs) are sparse embeddings that are "human-readable" in that dimensions correspond to fine-grained entity types.
We propose Intermediate enTity-based Sparse Interpretable Representation Learning (ItsIRL)
arXiv Detail & Related papers (2022-12-03T16:16:11Z) - Conditional Supervised Contrastive Learning for Fair Text Classification [59.813422435604025]
We study learning fair representations that satisfy a notion of fairness known as equalized odds for text classification via contrastive learning.
Specifically, we first theoretically analyze the connections between learning representations with a fairness constraint and conditional supervised contrastive objectives.
arXiv Detail & Related papers (2022-05-23T17:38:30Z) - Fair Interpretable Representation Learning with Correction Vectors [60.0806628713968]
We propose a new framework for fair representation learning that is centered around the learning of "correction vectors"
We show experimentally that several fair representation learning models constrained in such a way do not exhibit losses in ranking or classification performance.
arXiv Detail & Related papers (2022-02-07T11:19:23Z) - Desiderata for Representation Learning: A Causal Perspective [104.3711759578494]
We take a causal perspective on representation learning, formalizing non-spuriousness and efficiency (in supervised representation learning) and disentanglement (in unsupervised representation learning)
This yields computable metrics that can be used to assess the degree to which representations satisfy the desiderata of interest and learn non-spurious and disentangled representations from single observational datasets.
arXiv Detail & Related papers (2021-09-08T17:33:54Z) - Interpretable Representations in Explainable AI: From Theory to Practice [7.031336702345381]
Interpretable representations are the backbone of many explainers that target black-box predictive systems.
We study properties of interpretable representations that encode presence and absence of human-comprehensible concepts.
arXiv Detail & Related papers (2020-08-16T21:44:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.