Statistically Significant Concept-based Explanation of Image Classifiers
via Model Knockoffs
- URL: http://arxiv.org/abs/2305.18362v2
- Date: Wed, 31 May 2023 03:20:18 GMT
- Title: Statistically Significant Concept-based Explanation of Image Classifiers
via Model Knockoffs
- Authors: Kaiwen Xu, Kazuto Fukuchi, Youhei Akimoto and Jun Sakuma
- Abstract summary: Concept-based explanations may cause false positives, which misregards unrelated concepts as important for the prediction task.
We propose a method using a deep learning model to learn the image concept and then using the Knockoff samples to select the important concepts for prediction.
- Score: 22.576922942465142
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A concept-based classifier can explain the decision process of a deep
learning model by human-understandable concepts in image classification
problems. However, sometimes concept-based explanations may cause false
positives, which misregards unrelated concepts as important for the prediction
task. Our goal is to find the statistically significant concept for
classification to prevent misinterpretation. In this study, we propose a method
using a deep learning model to learn the image concept and then using the
Knockoff samples to select the important concepts for prediction by controlling
the False Discovery Rate (FDR) under a certain value. We evaluate the proposed
method in our synthetic and real data experiments. Also, it shows that our
method can control the FDR properly while selecting highly interpretable
concepts to improve the trustworthiness of the model.
Related papers
- Unlearning or Concealment? A Critical Analysis and Evaluation Metrics for Unlearning in Diffusion Models [7.9993879763024065]
We show that the objective functions used for unlearning in the existing methods lead to decoupling of the targeted concepts for the corresponding prompts.
The ineffectiveness of current methods stems primarily from their narrow focus on reducing generation probabilities for specific prompt sets.
We introduce two new evaluation metrics: Concept Retrieval Score (CRS) and Concept Confidence Score (CCS)
arXiv Detail & Related papers (2024-09-09T14:38:31Z) - ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance [78.44823280247438]
We present ClassDiffusion, a technique that leverages a semantic preservation loss to explicitly regulate the concept space when learning the new concept.
Despite its simplicity, this helps avoid semantic drift when fine-tuning on the target concepts.
In response to the ineffective evaluation of CLIP-T metrics, we introduce BLIP2-T metric.
arXiv Detail & Related papers (2024-05-27T17:50:10Z) - An Axiomatic Approach to Model-Agnostic Concept Explanations [67.84000759813435]
We propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity.
We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings.
arXiv Detail & Related papers (2024-01-12T20:53:35Z) - Estimation of Concept Explanations Should be Uncertainty Aware [39.598213804572396]
We study a specific kind called Concept Explanations, where the goal is to interpret a model using human-understandable concepts.
Although popular for their easy interpretation, concept explanations are known to be noisy.
We propose an uncertainty-aware Bayesian estimation method to address these issues, which readily improved the quality of explanations.
arXiv Detail & Related papers (2023-12-13T11:17:27Z) - Concept Distillation: Leveraging Human-Centered Explanations for Model
Improvement [3.026365073195727]
Concept Activation Vectors (CAVs) estimate a model's sensitivity and possible biases to a given concept.
We extend CAVs from post-hoc analysis to ante-hoc training in order to reduce model bias through fine-tuning.
We show applications of concept-sensitive training to debias several classification problems.
arXiv Detail & Related papers (2023-11-26T14:00:14Z) - Concept Gradient: Concept-based Interpretation Without Linear Assumption [77.96338722483226]
Concept Activation Vector (CAV) relies on learning a linear relation between some latent representation of a given model and concepts.
We proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions.
We demonstrated CG outperforms CAV in both toy examples and real world datasets.
arXiv Detail & Related papers (2022-08-31T17:06:46Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - Provable concept learning for interpretable predictions using
variational inference [7.0349768355860895]
In safety critical applications, practitioners are reluctant to trust neural networks when no interpretable explanations are available.
We propose a probabilistic modeling framework to derive (C)oncept (L)earning and (P)rediction (CLAP)
We prove that our method is able to identify them while attaining optimal classification accuracy.
arXiv Detail & Related papers (2022-04-01T14:51:38Z) - NUQ: Nonparametric Uncertainty Quantification for Deterministic Neural
Networks [151.03112356092575]
We show the principled way to measure the uncertainty of predictions for a classifier based on Nadaraya-Watson's nonparametric estimate of the conditional label distribution.
We demonstrate the strong performance of the method in uncertainty estimation tasks on a variety of real-world image datasets.
arXiv Detail & Related papers (2022-02-07T12:30:45Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Debiasing Concept-based Explanations with Causal Analysis [4.911435444514558]
We study the problem of the concepts being correlated with confounding information in the features.
We propose a new causal prior graph for modeling the impacts of unobserved variables.
We show that our debiasing method works when the concepts are not complete.
arXiv Detail & Related papers (2020-07-22T15:42:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.