Model Interpretability and Rationale Extraction by Input Mask Optimization
- URL: http://arxiv.org/abs/2508.11388v1
- Date: Fri, 15 Aug 2025 10:41:09 GMT
- Title: Model Interpretability and Rationale Extraction by Input Mask Optimization
- Authors: Marc Brinner, Sina Zarriess,
- Abstract summary: We propose a new method to generate extractive explanations for predictions made by neural networks.<n>The masking is done using gradient-based optimization combined with a new regularization scheme.<n>We apply the same method to image inputs and obtain high quality explanations for image classifications.
- Score: 2.3020018305241337
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Concurrent to the rapid progress in the development of neural-network based models in areas like natural language processing and computer vision, the need for creating explanations for the predictions of these black-box models has risen steadily. We propose a new method to generate extractive explanations for predictions made by neural networks, that is based on masking parts of the input which the model does not consider to be indicative of the respective class. The masking is done using gradient-based optimization combined with a new regularization scheme that enforces sufficiency, comprehensiveness and compactness of the generated explanation, three properties that are known to be desirable from the related field of rationale extraction in natural language processing. In this way, we bridge the gap between model interpretability and rationale extraction, thereby proving that the latter of which can be performed without training a specialized model, only on the basis of a trained classifier. We further apply the same method to image inputs and obtain high quality explanations for image classifications, which indicates that the conditions proposed for rationale extraction in natural language processing are more broadly applicable to different input types.
Related papers
- Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations [53.91818843831925]
We propose NExT-Vid, a novel autoregressive visual generative pretraining framework.<n>We introduce a context-isolated autoregressive predictor to decouple semantic representation from target decoding.<n>Through context-isolated flow-matching pretraining, our approach achieves strong representations.
arXiv Detail & Related papers (2025-12-24T07:07:08Z) - P-TAME: Explain Any Image Classifier with Trained Perturbations [14.31574090533474]
P-TAME (Perturbation-based Trainable Attention Mechanism for Explanations) is a model-agnostic method for explaining Deep Neural Networks (DNNs)<n>It generates high-resolution explanations in a single forward pass during inference.<n>We apply P-TAME to explain the decisions of VGG-16, ResNet-50, and ViT-B-16, three distinct and widely used image classifiers.
arXiv Detail & Related papers (2025-01-29T18:06:08Z) - Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
We propose an algorithm that enables fast and high-quality generation under arbitrary constraints.<n>During inference, we can interchange between gradient updates computed on the noisy image and updates computed on the final, clean image.<n>Our approach produces results that rival or surpass the state-of-the-art training-free inference approaches.
arXiv Detail & Related papers (2024-10-24T14:52:38Z) - Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach [10.54430941755474]
This paper proposes a post-hoc natural language explanation method that can be applied to any CNN-based classification system.<n>By analysing influential neurons and the corresponding activation maps, the method generates a faithful description of the classifier's decision process.<n> Experimental results show that the NLEs constructed by our method are significantly more plausible and faithful.
arXiv Detail & Related papers (2024-07-30T15:17:15Z) - Generalizing Backpropagation for Gradient-Based Interpretability [103.2998254573497]
We show that the gradient of a model is a special case of a more general formulation using semirings.
This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics.
arXiv Detail & Related papers (2023-07-06T15:19:53Z) - Bayesian Prompt Learning for Image-Language Model Generalization [64.50204877434878]
We use the regularization ability of Bayesian methods to frame prompt learning as a variational inference problem.
Our approach regularizes the prompt space, reduces overfitting to the seen prompts and improves the prompt generalization on unseen prompts.
We demonstrate empirically on 15 benchmarks that Bayesian prompt learning provides an appropriate coverage of the prompt space.
arXiv Detail & Related papers (2022-10-05T17:05:56Z) - Sampling Based On Natural Image Statistics Improves Local Surrogate
Explainers [111.31448606885672]
Surrogate explainers are a popular post-hoc interpretability method to further understand how a model arrives at a prediction.
We propose two approaches to do so, namely (1) altering the method for sampling the local neighbourhood and (2) using perceptual metrics to convey some of the properties of the distribution of natural images.
arXiv Detail & Related papers (2022-08-08T08:10:13Z) - Instance-Based Neural Dependency Parsing [56.63500180843504]
We develop neural models that possess an interpretable inference process for dependency parsing.
Our models adopt instance-based inference, where dependency edges are extracted and labeled by comparing them to edges in a training set.
arXiv Detail & Related papers (2021-09-28T05:30:52Z) - Distilling Interpretable Models into Human-Readable Code [71.11328360614479]
Human-readability is an important and desirable standard for machine-learned model interpretability.
We propose to train interpretable models using conventional methods, and then distill them into concise, human-readable code.
We describe a piecewise-linear curve-fitting algorithm that produces high-quality results efficiently and reliably across a broad range of use cases.
arXiv Detail & Related papers (2021-01-21T01:46:36Z) - Exploring End-to-End Differentiable Natural Logic Modeling [21.994060519995855]
We explore end-to-end trained differentiable models that integrate natural logic with neural networks.
The proposed model adapts module networks to model natural logic operations, which is enhanced with a memory component to model contextual information.
arXiv Detail & Related papers (2020-11-08T18:18:15Z) - A Framework to Learn with Interpretation [2.3741312212138896]
We present a novel framework to jointly learn a predictive model and its associated interpretation model.
We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers.
A detailed pipeline to visualize the learnt features is also developed.
arXiv Detail & Related papers (2020-10-19T09:26:28Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Learning Variational Word Masks to Improve the Interpretability of
Neural Text Classifiers [21.594361495948316]
A new line of work on improving model interpretability has just started, and many existing methods require either prior information or human annotations as additional inputs in training.
We propose the variational word mask (VMASK) method to automatically learn task-specific important words and reduce irrelevant information on classification, which ultimately improves the interpretability of model predictions.
arXiv Detail & Related papers (2020-10-01T20:02:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.