SurroCBM: Concept Bottleneck Surrogate Models for Generative Post-hoc
Explanation
- URL: http://arxiv.org/abs/2310.07698v1
- Date: Wed, 11 Oct 2023 17:46:59 GMT
- Title: SurroCBM: Concept Bottleneck Surrogate Models for Generative Post-hoc
Explanation
- Authors: Bo Pan, Zhenke Liu, Yifei Zhang, Liang Zhao
- Abstract summary: This paper introduces the Concept Bottleneck Surrogate Models (SurroCBM) to explain black-box models.
SurroCBM identifies shared and unique concepts across various black-box models and employs an explainable surrogate model for post-hoc explanations.
An effective training strategy using self-generated data is proposed to enhance explanation quality continuously.
- Score: 11.820167569334444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Explainable AI seeks to bring light to the decision-making processes of
black-box models. Traditional saliency-based methods, while highlighting
influential data segments, often lack semantic understanding. Recent
advancements, such as Concept Activation Vectors (CAVs) and Concept Bottleneck
Models (CBMs), offer concept-based explanations but necessitate human-defined
concepts. However, human-annotated concepts are expensive to attain. This paper
introduces the Concept Bottleneck Surrogate Models (SurroCBM), a novel
framework that aims to explain the black-box models with automatically
discovered concepts. SurroCBM identifies shared and unique concepts across
various black-box models and employs an explainable surrogate model for
post-hoc explanations. An effective training strategy using self-generated data
is proposed to enhance explanation quality continuously. Through extensive
experiments, we demonstrate the efficacy of SurroCBM in concept discovery and
explanation, underscoring its potential in advancing the field of explainable
AI.
Related papers
- Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery [52.498055901649025]
Concept Bottleneck Models (CBMs) have been proposed to address the 'black-box' problem of deep neural networks.
We propose a novel CBM approach -- called Discover-then-Name-CBM (DN-CBM) -- that inverts the typical paradigm.
Our concept extraction strategy is efficient, since it is agnostic to the downstream task, and uses concepts already known to the model.
arXiv Detail & Related papers (2024-07-19T17:50:11Z) - Concept Bottleneck Models Without Predefined Concepts [26.156636891713745]
We introduce an input-dependent concept selection mechanism that ensures only a small subset of concepts is used across all classes.
We show that our approach improves downstream performance and narrows the performance gap to black-box models.
arXiv Detail & Related papers (2024-07-04T13:34:50Z) - Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models [57.86303579812877]
Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions.
Existing approaches often require numerous human interventions per image to achieve strong performances.
We introduce a trainable concept realignment intervention module, which leverages concept relations to realign concept assignments post-intervention.
arXiv Detail & Related papers (2024-05-02T17:59:01Z) - Incremental Residual Concept Bottleneck Models [29.388549499546556]
Concept Bottleneck Models (CBMs) map the black-box visual representations extracted by deep neural networks onto a set of interpretable concepts.
We propose the Incremental Residual Concept Bottleneck Model (Res-CBM) to address the challenge of concept completeness.
Our approach can be applied to any user-defined concept bank, as a post-hoc processing method to enhance the performance of any CBMs.
arXiv Detail & Related papers (2024-04-13T12:02:19Z) - A survey on Concept-based Approaches For Model Improvement [2.1516043775965565]
Concepts are known to be the thinking ground of humans.
We provide a systematic review and taxonomy of various concept representations and their discovery algorithms in Deep Neural Networks (DNNs)
We also provide details on concept-based model improvement literature marking the first comprehensive survey of these methods.
arXiv Detail & Related papers (2024-03-21T17:09:20Z) - An Axiomatic Approach to Model-Agnostic Concept Explanations [67.84000759813435]
We propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity.
We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings.
arXiv Detail & Related papers (2024-01-12T20:53:35Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Concept Gradient: Concept-based Interpretation Without Linear Assumption [77.96338722483226]
Concept Activation Vector (CAV) relies on learning a linear relation between some latent representation of a given model and concepts.
We proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions.
We demonstrated CG outperforms CAV in both toy examples and real world datasets.
arXiv Detail & Related papers (2022-08-31T17:06:46Z) - ConceptDistil: Model-Agnostic Distillation of Concept Explanations [4.462334751640166]
Concept-based explanations aims to fill the model interpretability gap for non-technical humans-in-the-loop.
We propose ConceptDistil, a method to bring concept explanations to any black-box classifier using knowledge distillation.
We validate ConceptDistil in a real world use-case, showing that it is able to optimize both tasks.
arXiv Detail & Related papers (2022-05-07T08:58:54Z) - Human-Centered Concept Explanations for Neural Networks [47.71169918421306]
We introduce concept explanations including the class of Concept Activation Vectors (CAV)
We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats.
Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.
arXiv Detail & Related papers (2022-02-25T01:27:31Z) - Promises and Pitfalls of Black-Box Concept Learning Models [26.787383014558802]
We show that machine learning models that incorporate concept learning encode information beyond the pre-defined concepts.
Natural mitigation strategies do not fully work, rendering the interpretation of the downstream prediction misleading.
arXiv Detail & Related papers (2021-06-24T21:00:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.