Concept Activation Regions: A Generalized Framework For Concept-Based
Explanations
- URL: http://arxiv.org/abs/2209.11222v1
- Date: Thu, 22 Sep 2022 17:59:03 GMT
- Title: Concept Activation Regions: A Generalized Framework For Concept-Based
Explanations
- Authors: Jonathan Crabb\'e and Mihaela van der Schaar
- Abstract summary: Existing methods assume that the examples illustrating a concept are mapped in a fixed direction of the deep neural network's latent space.
In this work, we propose allowing concept examples to be scattered across different clusters in the DNN's latent space.
This concept activation region (CAR) formalism yields global concept-based explanations and local concept-based feature importance.
- Score: 95.94432031144716
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Concept-based explanations permit to understand the predictions of a deep
neural network (DNN) through the lens of concepts specified by users. Existing
methods assume that the examples illustrating a concept are mapped in a fixed
direction of the DNN's latent space. When this holds true, the concept can be
represented by a concept activation vector (CAV) pointing in that direction. In
this work, we propose to relax this assumption by allowing concept examples to
be scattered across different clusters in the DNN's latent space. Each concept
is then represented by a region of the DNN's latent space that includes these
clusters and that we call concept activation region (CAR). To formalize this
idea, we introduce an extension of the CAV formalism that is based on the
kernel trick and support vector classifiers. This CAR formalism yields global
concept-based explanations and local concept-based feature importance. We prove
that CAR explanations built with radial kernels are invariant under latent
space isometries. In this way, CAR assigns the same explanations to latent
spaces that have the same geometry. We further demonstrate empirically that
CARs offer (1) more accurate descriptions of how concepts are scattered in the
DNN's latent space; (2) global explanations that are closer to human concept
annotations and (3) concept-based feature importance that meaningfully relate
concepts with each other. Finally, we use CARs to show that DNNs can
autonomously rediscover known scientific concepts, such as the prostate cancer
grading system.
Related papers
- Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery [52.498055901649025]
Concept Bottleneck Models (CBMs) have been proposed to address the 'black-box' problem of deep neural networks.
We propose a novel CBM approach -- called Discover-then-Name-CBM (DN-CBM) -- that inverts the typical paradigm.
Our concept extraction strategy is efficient, since it is agnostic to the downstream task, and uses concepts already known to the model.
arXiv Detail & Related papers (2024-07-19T17:50:11Z) - Local Concept Embeddings for Analysis of Concept Distributions in DNN Feature Spaces [1.0923877073891446]
We propose a novel concept analysis framework for deep neural networks (DNNs)
Instead of optimizing a single global concept vector on the complete dataset, it generates a local concept embedding (LoCE) vector for each individual sample.
Despite its context sensitivity, our method's concept segmentation performance is competitive to global baselines.
arXiv Detail & Related papers (2023-11-24T12:22:00Z) - Implicit Concept Removal of Diffusion Models [92.55152501707995]
Text-to-image (T2I) diffusion models often inadvertently generate unwanted concepts such as watermarks and unsafe images.
We present the Geom-Erasing, a novel concept removal method based on the geometric-driven control.
arXiv Detail & Related papers (2023-10-09T17:13:10Z) - Emergence of Concepts in DNNs? [0.0]
It is examined, first, how existing methods actually identify concepts that are supposedly represented in DNNs.
Second, it is discussed how conceptual spaces are shaped by a tradeoff between predictive accuracy and compression.
arXiv Detail & Related papers (2022-11-11T11:25:39Z) - Concept Gradient: Concept-based Interpretation Without Linear Assumption [77.96338722483226]
Concept Activation Vector (CAV) relies on learning a linear relation between some latent representation of a given model and concepts.
We proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions.
We demonstrated CG outperforms CAV in both toy examples and real world datasets.
arXiv Detail & Related papers (2022-08-31T17:06:46Z) - Sparse Subspace Clustering for Concept Discovery (SSCCD) [1.7319807100654885]
Concepts are key building blocks of higher level human understanding.
Local attribution methods do not allow to identify coherent model behavior across samples.
We put forward a new definition of concepts as low-dimensional subspaces of hidden feature layers.
arXiv Detail & Related papers (2022-03-11T16:15:48Z) - Human-Centered Concept Explanations for Neural Networks [47.71169918421306]
We introduce concept explanations including the class of Concept Activation Vectors (CAV)
We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats.
Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.
arXiv Detail & Related papers (2022-02-25T01:27:31Z) - Formalising Concepts as Grounded Abstractions [68.24080871981869]
This report shows how representation learning can be used to induce concepts from raw data.
The main technical goal of this report is to show how techniques from representation learning can be married with a lattice-theoretic formulation of conceptual spaces.
arXiv Detail & Related papers (2021-01-13T15:22:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.