Explain To Me: Salience-Based Explainability for Synthetic Face
Detection Models
- URL: http://arxiv.org/abs/2303.11969v2
- Date: Mon, 27 Mar 2023 16:56:38 GMT
- Title: Explain To Me: Salience-Based Explainability for Synthetic Face
Detection Models
- Authors: Colton Crum, Patrick Tinsley, Aidan Boyd, Jacob Piland, Christopher
Sweet, Timothy Kelley, Kevin Bowyer, Adam Czajka
- Abstract summary: We propose five methods of leveraging model salience to explain a model behavior at scale.
These methods ask: (a) what is the average entropy for a model's salience maps, (b) how does model salience change when fed out-of-set samples, (c) what is the stability of model salience across independent training runs, and (e) how does model salience react to salience-guided image degradations.
- Score: 3.0467185351395827
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The performance of convolutional neural networks has continued to improve
over the last decade. At the same time, as model complexity grows, it becomes
increasingly more difficult to explain model decisions. Such explanations may
be of critical importance for reliable operation of human-machine pairing
setups, or for model selection when the "best" model among many
equally-accurate models must be established. Saliency maps represent one
popular way of explaining model decisions by highlighting image regions models
deem important when making a prediction. However, examining salience maps at
scale is not practical. In this paper, we propose five novel methods of
leveraging model salience to explain a model behavior at scale. These methods
ask: (a) what is the average entropy for a model's salience maps, (b) how does
model salience change when fed out-of-set samples, (c) how closely does model
salience follow geometrical transformations, (d) what is the stability of model
salience across independent training runs, and (e) how does model salience
react to salience-guided image degradations. To assess the proposed measures on
a concrete and topical problem, we conducted a series of experiments for the
task of synthetic face detection with two types of models: those trained
traditionally with cross-entropy loss, and those guided by human salience when
training to increase model generalizability. These two types of models are
characterized by different, interpretable properties of their salience maps,
which allows for the evaluation of the correctness of the proposed measures. We
offer source codes for each measure along with this paper.
Related papers
- From Black-box to Causal-box: Towards Building More Interpretable Models [57.23201263629627]
We introduce the notion of causal interpretability, which formalizes when counterfactual queries can be evaluated from a specific class of models.<n>We derive a complete graphical criterion that determines whether a given model architecture supports a given counterfactual query.
arXiv Detail & Related papers (2025-10-24T20:03:18Z) - On the Edge of Memorization in Diffusion Models [25.927892368310868]
We introduce a scientific and mathematical "laboratory" for investigating memorization and generalization in practical diffusion models.<n>Our work provides an analytically tractable and practically meaningful setting for future theoretical and empirical investigations.
arXiv Detail & Related papers (2025-08-25T05:56:05Z) - Latent diffusion models for parameterization and data assimilation of facies-based geomodels [0.0]
Diffusion models are trained to generate new geological realizations from input fields characterized by random noise.
Latent diffusion models are shown to provide realizations that are visually consistent with samples from geomodeling software.
arXiv Detail & Related papers (2024-06-21T01:32:03Z) - Ablation Based Counterfactuals [7.481286710933861]
Ablation Based Counterfactuals (ABC) is a method of performing counterfactual analysis that relies on model ablation rather than model retraining.
We demonstrate how we can construct a model like this using an ensemble of diffusion models.
We then use this model to study the limits of training data attribution by enumerating full counterfactual landscapes.
arXiv Detail & Related papers (2024-06-12T06:22:51Z) - Causal Estimation of Memorisation Profiles [58.20086589761273]
Understanding memorisation in language models has practical and societal implications.
Memorisation is the causal effect of training with an instance on the model's ability to predict that instance.
This paper proposes a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics.
arXiv Detail & Related papers (2024-06-06T17:59:09Z) - COSE: A Consistency-Sensitivity Metric for Saliency on Image
Classification [21.3855970055692]
We present a set of metrics that utilize vision priors to assess the performance of saliency methods on image classification tasks.
We show that although saliency methods are thought to be architecture-independent, most methods could better explain transformer-based models over convolutional-based models.
arXiv Detail & Related papers (2023-09-20T01:06:44Z) - Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models.
We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - To what extent do human explanations of model behavior align with actual
model behavior? [91.67905128825402]
We investigated the extent to which human-generated explanations of models' inference decisions align with how models actually make these decisions.
We defined two alignment metrics that quantify how well natural language human explanations align with model sensitivity to input words.
We find that a model's alignment with human explanations is not predicted by the model's accuracy on NLI.
arXiv Detail & Related papers (2020-12-24T17:40:06Z) - Are Visual Explanations Useful? A Case Study in Model-in-the-Loop
Prediction [49.254162397086006]
We study explanations based on visual saliency in an image-based age prediction task.
We find that presenting model predictions improves human accuracy.
However, explanations of various kinds fail to significantly alter human accuracy or trust in the model.
arXiv Detail & Related papers (2020-07-23T20:39:40Z) - Learning Invariances for Interpretability using Supervised VAE [0.0]
We learn model invariances as a means of interpreting a model.
We propose a supervised form of variational auto-encoders (VAEs)
We show how combining our model with feature attribution methods it is possible to reach a more fine-grained understanding about the decision process of the model.
arXiv Detail & Related papers (2020-07-15T10:14:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.