Related papers: Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence

Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence

URL: http://arxiv.org/abs/2202.03482v3
Date: Wed, 07 May 2025 08:08:45 GMT
Title: Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence
Authors: Frederik Pahde, Maximilian Dreyer, Leander Weber, Moritz Weckbecker, Christopher J. Anders, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin,
Abstract summary: Concept Activation Vectors (CAVs) have emerged as a popular tool for modeling human-understandable concepts in the latent space.<n>In this paper we show that such a separability-oriented leads to solutions, which may diverge from the actual goal of precisely modeling the concept direction.<n>We introduce pattern-based CAVs, solely focussing on concept signals, thereby providing more accurate concept directions.
Score: 13.618809162030486
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With a growing interest in understanding neural network prediction strategies, Concept Activation Vectors (CAVs) have emerged as a popular tool for modeling human-understandable concepts in the latent space. Commonly, CAVs are computed by leveraging linear classifiers optimizing the separability of latent representations of samples with and without a given concept. However, in this paper we show that such a separability-oriented computation leads to solutions, which may diverge from the actual goal of precisely modeling the concept direction. This discrepancy can be attributed to the significant influence of distractor directions, i.e., signals unrelated to the concept, which are picked up by filters (i.e., weights) of linear models to optimize class-separability. To address this, we introduce pattern-based CAVs, solely focussing on concept signals, thereby providing more accurate concept directions. We evaluate various CAV methods in terms of their alignment with the true concept direction and their impact on CAV applications, including concept sensitivity testing and model correction for shortcut behavior caused by data artifacts. We demonstrate the benefits of pattern-based CAVs using the Pediatric Bone Age, ISIC2019, and FunnyBirds datasets with VGG, ResNet, ReXNet, EfficientNet, and Vision Transformer as model architectures.

Related papers

Neural Concept Verifier: Scaling Prover-Verifier Games via Concept Encodings [20.59727124775316]
We introduce the Neural Concept Verifier (NCV), a unified framework combining PVGs with concept encodings for interpretable, nonlinear classification in high-dimensional settings.<n>NCV achieves this by utilizing recent minimally supervised concept discovery models to extract structured concept encodings from raw inputs.
arXiv Detail & Related papers (2025-07-10T08:28:46Z)
Interpretable Reward Modeling with Active Concept Bottlenecks [54.00085739303773]
We introduce Concept Bottleneck Reward Models (CB-RM), a reward modeling framework that enables interpretable preference learning.<n>Unlike standard RLHF methods that rely on opaque reward functions, CB-RM decomposes reward prediction into human-interpretable concepts.<n>We formalize an active learning strategy that dynamically acquires the most informative concept labels.
arXiv Detail & Related papers (2025-07-07T06:26:04Z)
Interpretable Few-Shot Image Classification via Prototypical Concept-Guided Mixture of LoRA Experts [79.18608192761512]
Self-Explainable Models (SEMs) rely on Prototypical Concept Learning (PCL) to enable their visual recognition processes more interpretable.<n>We propose a Few-Shot Prototypical Concept Classification framework that mitigates two key challenges under low-data regimes: parametric imbalance and representation misalignment.<n>Our approach consistently outperforms existing SEMs by a notable margin, with 4.2%-8.7% relative gains in 5-way 5-shot classification.
arXiv Detail & Related papers (2025-06-05T06:39:43Z)
FastCAV: Efficient Computation of Concept Activation Vectors for Explaining Deep Neural Networks [10.20676488210292]
Concept Activation Vectors (CAVs) can identify whether a model learned a concept or not.<n>FastCAV is a novel approach that accelerates the extraction of CAVs by up to 63.6x (on average 46.4x)
arXiv Detail & Related papers (2025-05-23T13:31:54Z)
Post-Hoc Concept Disentanglement: From Correlated to Isolated Concept Representations [12.072112471560716]
Concept Activation Vectors (CAVs) are widely used to model human-understandable concepts. They are trained by identifying directions from the activations of concept samples to those of non-concept samples. This method produces similar, non-orthogonal directions for correlated concepts, such as "beard" and "necktie" This entanglement complicates the interpretation of concepts in isolation and can lead to undesired effects in CAV applications.
arXiv Detail & Related papers (2025-03-07T15:45:43Z)
Concept Layers: Enhancing Interpretability and Intervenability via LLM Conceptualization [2.163881720692685]
We introduce a new methodology for incorporating interpretability and intervenability into an existing model by integrating Concept Layers into its architecture. Our approach projects the model's internal vector representations into a conceptual, explainable vector space before reconstructing and feeding them back into the model. We evaluate CLs across multiple tasks, demonstrating that they maintain the original model's performance and agreement while enabling meaningful interventions.
arXiv Detail & Related papers (2025-02-19T11:10:19Z)
Sparse autoencoders reveal selective remapping of visual concepts during adaptation [54.82630842681845]
Adapting foundation models for specific purposes has become a standard approach to build machine learning systems. We develop a new Sparse Autoencoder (SAE) for the CLIP vision transformer, named PatchSAE, to extract interpretable concepts.
arXiv Detail & Related papers (2024-12-06T18:59:51Z)
Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery [52.498055901649025]
Concept Bottleneck Models (CBMs) have been proposed to address the 'black-box' problem of deep neural networks. We propose a novel CBM approach -- called Discover-then-Name-CBM (DN-CBM) -- that inverts the typical paradigm. Our concept extraction strategy is efficient, since it is agnostic to the downstream task, and uses concepts already known to the model.
arXiv Detail & Related papers (2024-07-19T17:50:11Z)
Improving the Explain-Any-Concept by Introducing Nonlinearity to the Trainable Surrogate Model [4.6040036610482655]
Explain Any Concept (EAC) model is a flexible method for explaining decisions. EAC model is based on using a surrogate model which has one trainable linear layer to simulate the target model. We show that by introducing an additional nonlinear layer to the original surrogate model, we can improve the performance of the EAC model.
arXiv Detail & Related papers (2024-05-20T07:25:09Z)
Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models [57.86303579812877]
Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions. Existing approaches often require numerous human interventions per image to achieve strong performances. We introduce a trainable concept realignment intervention module, which leverages concept relations to realign concept assignments post-intervention.
arXiv Detail & Related papers (2024-05-02T17:59:01Z)
Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations [13.60538902487872]
We present a novel post-hoc concept-based XAI framework that conveys besides instance-wise (local) also class-wise (global) decision-making strategies via prototypes. We demonstrate the effectiveness of our approach in identifying out-of-distribution samples, spurious model behavior and data quality issues across three datasets.
arXiv Detail & Related papers (2023-11-28T10:53:26Z)
Identifying Linear Relational Concepts in Large Language Models [16.917379272022064]
Transformer language models (LMs) have been shown to represent concepts as directions in the latent space of hidden activations. We present a technique called linear relational concepts (LRC) for finding concept directions corresponding to human-interpretable concepts.
arXiv Detail & Related papers (2023-11-15T14:01:41Z)
Learning Transferable Conceptual Prototypes for Interpretable Unsupervised Domain Adaptation [79.22678026708134]
In this paper, we propose an inherently interpretable method, named Transferable Prototype Learning ( TCPL) To achieve this goal, we design a hierarchically prototypical module that transfers categorical basic concepts from the source domain to the target domain and learns domain-shared prototypes for explaining the underlying reasoning process. Comprehensive experiments show that the proposed method can not only provide effective and intuitive explanations but also outperform previous state-of-the-arts.
arXiv Detail & Related papers (2023-10-12T06:36:41Z)
Concept Gradient: Concept-based Interpretation Without Linear Assumption [77.96338722483226]
Concept Activation Vector (CAV) relies on learning a linear relation between some latent representation of a given model and concepts. We proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions. We demonstrated CG outperforms CAV in both toy examples and real world datasets.
arXiv Detail & Related papers (2022-08-31T17:06:46Z)
Exploring Concept Contribution Spatially: Hidden Layer Interpretation with Spatial Activation Concept Vector [5.873416857161077]
Testing with Concept Activation Vector (TCAV) presents a powerful tool to quantify the contribution of query concepts to a target class. For some images where the target object only occupies a small fraction of the region, TCAV evaluation may be interfered with by redundant background features.
arXiv Detail & Related papers (2022-05-21T15:58:57Z)
Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images. In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner. We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z)
Invertible Concept-based Explanations for CNN Models with Non-negative Concept Activation Vectors [24.581839689833572]
Convolutional neural network (CNN) models for computer vision are powerful but lack explainability in their most basic form. Recent work on explanations through feature importance of approximate linear models has moved from input-level features to features from mid-layer feature maps in the form of concept activation vectors (CAVs) In this work, we rethink the ACE algorithm of Ghorbani etal., proposing an alternative invertible concept-based explanation (ICE) framework to overcome its shortcomings.
arXiv Detail & Related papers (2020-06-27T17:57:26Z)
MetaSDF: Meta-learning Signed Distance Functions [85.81290552559817]
Generalizing across shapes with neural implicit representations amounts to learning priors over the respective function space. We formalize learning of a shape space as a meta-learning problem and leverage gradient-based meta-learning algorithms to solve this task.
arXiv Detail & Related papers (2020-06-17T05:14:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.