ConceptDistil: Model-Agnostic Distillation of Concept Explanations
- URL: http://arxiv.org/abs/2205.03601v1
- Date: Sat, 7 May 2022 08:58:54 GMT
- Title: ConceptDistil: Model-Agnostic Distillation of Concept Explanations
- Authors: Jo\~ao Bento Sousa, Ricardo Moreira, Vladimir Balayan, Pedro Saleiro,
Pedro Bizarro
- Abstract summary: Concept-based explanations aims to fill the model interpretability gap for non-technical humans-in-the-loop.
We propose ConceptDistil, a method to bring concept explanations to any black-box classifier using knowledge distillation.
We validate ConceptDistil in a real world use-case, showing that it is able to optimize both tasks.
- Score: 4.462334751640166
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Concept-based explanations aims to fill the model interpretability gap for
non-technical humans-in-the-loop. Previous work has focused on providing
concepts for specific models (eg, neural networks) or data types (eg, images),
and by either trying to extract concepts from an already trained network or
training self-explainable models through multi-task learning. In this work, we
propose ConceptDistil, a method to bring concept explanations to any black-box
classifier using knowledge distillation. ConceptDistil is decomposed into two
components:(1) a concept model that predicts which domain concepts are present
in a given instance, and (2) a distillation model that tries to mimic the
predictions of a black-box model using the concept model predictions. We
validate ConceptDistil in a real world use-case, showing that it is able to
optimize both tasks, bringing concept-explainability to any black-box model.
Related papers
- Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery [52.498055901649025]
Concept Bottleneck Models (CBMs) have been proposed to address the 'black-box' problem of deep neural networks.
We propose a novel CBM approach -- called Discover-then-Name-CBM (DN-CBM) -- that inverts the typical paradigm.
Our concept extraction strategy is efficient, since it is agnostic to the downstream task, and uses concepts already known to the model.
arXiv Detail & Related papers (2024-07-19T17:50:11Z) - Concept Bottleneck Models Without Predefined Concepts [26.156636891713745]
We introduce an input-dependent concept selection mechanism that ensures only a small subset of concepts is used across all classes.
We show that our approach improves downstream performance and narrows the performance gap to black-box models.
arXiv Detail & Related papers (2024-07-04T13:34:50Z) - Separable Multi-Concept Erasure from Diffusion Models [52.51972530398691]
We propose a Separable Multi-concept Eraser (SepME) to eliminate unsafe concepts from large-scale diffusion models.
The latter separates optimizable model weights, making each weight increment correspond to a specific concept erasure.
Extensive experiments indicate the efficacy of our approach in eliminating concepts, preserving model performance, and offering flexibility in the erasure or recovery of various concepts.
arXiv Detail & Related papers (2024-02-03T11:10:57Z) - Knowledge-Aware Neuron Interpretation for Scene Classification [32.32713349524347]
We propose a knowledge-aware neuron interpretation framework to explain model predictions for image scene classification.
For concept completeness, we present core concepts of a scene based on knowledge graph, ConceptNet, to gauge the completeness of concepts.
For concept fusion, we introduce a knowledge graph-based method known as Concept Filtering, which produces over 23% point gain on neuron behaviors for neuron interpretation.
arXiv Detail & Related papers (2024-01-29T01:00:17Z) - An Axiomatic Approach to Model-Agnostic Concept Explanations [67.84000759813435]
We propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity.
We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings.
arXiv Detail & Related papers (2024-01-12T20:53:35Z) - ConcEPT: Concept-Enhanced Pre-Training for Language Models [57.778895980999124]
ConcEPT aims to infuse conceptual knowledge into pre-trained language models.
It exploits external entity concept prediction to predict the concepts of entities mentioned in the pre-training contexts.
Results of experiments show that ConcEPT gains improved conceptual knowledge with concept-enhanced pre-training.
arXiv Detail & Related papers (2024-01-11T05:05:01Z) - SurroCBM: Concept Bottleneck Surrogate Models for Generative Post-hoc
Explanation [11.820167569334444]
This paper introduces the Concept Bottleneck Surrogate Models (SurroCBM) to explain black-box models.
SurroCBM identifies shared and unique concepts across various black-box models and employs an explainable surrogate model for post-hoc explanations.
An effective training strategy using self-generated data is proposed to enhance explanation quality continuously.
arXiv Detail & Related papers (2023-10-11T17:46:59Z) - Concept Gradient: Concept-based Interpretation Without Linear Assumption [77.96338722483226]
Concept Activation Vector (CAV) relies on learning a linear relation between some latent representation of a given model and concepts.
We proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions.
We demonstrated CG outperforms CAV in both toy examples and real world datasets.
arXiv Detail & Related papers (2022-08-31T17:06:46Z) - Human-Centered Concept Explanations for Neural Networks [47.71169918421306]
We introduce concept explanations including the class of Concept Activation Vectors (CAV)
We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats.
Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.
arXiv Detail & Related papers (2022-02-25T01:27:31Z) - MACE: Model Agnostic Concept Extractor for Explaining Image
Classification Networks [10.06397994266945]
We propose MACE: a Model Agnostic Concept Extractor, which can explain the working of a convolutional network through smaller concepts.
We validate our framework using VGG16 and ResNet50 CNN architectures, and on datasets like Animals With Attributes 2 (AWA2) and Places365.
arXiv Detail & Related papers (2020-11-03T04:40:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.