GCAV: A Global Concept Activation Vector Framework for Cross-Layer Consistency in Interpretability
- URL: http://arxiv.org/abs/2508.21197v2
- Date: Tue, 09 Sep 2025 20:57:47 GMT
- Title: GCAV: A Global Concept Activation Vector Framework for Cross-Layer Consistency in Interpretability
- Authors: Zhenghao He, Sanchit Sinha, Guangzhi Xiong, Aidong Zhang,
- Abstract summary: Concept Activation Vectors (CAVs) provide a powerful approach for interpreting deep neural networks by quantifying their sensitivity to human-defined concepts.<n>When computed independently at different layers, CAVs often exhibit inconsistencies, making cross-layer comparisons unreliable.<n>We propose the Global Concept Activation Vector (GCAV), a novel framework that unifies CAVs into a single, semantically consistent representation.
- Score: 41.6338086518055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Concept Activation Vectors (CAVs) provide a powerful approach for interpreting deep neural networks by quantifying their sensitivity to human-defined concepts. However, when computed independently at different layers, CAVs often exhibit inconsistencies, making cross-layer comparisons unreliable. To address this issue, we propose the Global Concept Activation Vector (GCAV), a novel framework that unifies CAVs into a single, semantically consistent representation. Our method leverages contrastive learning to align concept representations across layers and employs an attention-based fusion mechanism to construct a globally integrated CAV. By doing so, our method significantly reduces the variance in TCAV scores while preserving concept relevance, ensuring more stable and reliable concept attributions. To evaluate the effectiveness of GCAV, we introduce Testing with Global Concept Activation Vectors (TGCAV) as a method to apply TCAV to GCAV-based representations. We conduct extensive experiments on multiple deep neural networks, demonstrating that our method effectively mitigates concept inconsistency across layers, enhances concept localization, and improves robustness against adversarial perturbations. By integrating cross-layer information into a coherent framework, our method offers a more comprehensive and interpretable understanding of how deep learning models encode human-defined concepts. Code and models are available at https://github.com/Zhenghao-He/GCAV.
Related papers
- FaCT: Faithful Concept Traces for Explaining Neural Network Decisions [56.796533084868884]
Deep networks have shown remarkable performance across a wide range of tasks, yet getting a global concept-level understanding of how they function remains a key challenge.<n>We put emphasis on the faithfulness of concept-based explanations and propose a new model with model-inherent mechanistic concept-explanations.<n>Our concepts are shared across classes and, from any layer, their contribution to the logit and their input-visualization can be faithfully traced.
arXiv Detail & Related papers (2025-10-29T13:35:46Z) - Interpretable Reward Modeling with Active Concept Bottlenecks [54.00085739303773]
We introduce Concept Bottleneck Reward Models (CB-RM), a reward modeling framework that enables interpretable preference learning.<n>Unlike standard RLHF methods that rely on opaque reward functions, CB-RM decomposes reward prediction into human-interpretable concepts.<n>We formalize an active learning strategy that dynamically acquires the most informative concept labels.
arXiv Detail & Related papers (2025-07-07T06:26:04Z) - Interpretable Few-Shot Image Classification via Prototypical Concept-Guided Mixture of LoRA Experts [79.18608192761512]
Self-Explainable Models (SEMs) rely on Prototypical Concept Learning (PCL) to enable their visual recognition processes more interpretable.<n>We propose a Few-Shot Prototypical Concept Classification framework that mitigates two key challenges under low-data regimes: parametric imbalance and representation misalignment.<n>Our approach consistently outperforms existing SEMs by a notable margin, with 4.2%-8.7% relative gains in 5-way 5-shot classification.
arXiv Detail & Related papers (2025-06-05T06:39:43Z) - FastCAV: Efficient Computation of Concept Activation Vectors for Explaining Deep Neural Networks [10.20676488210292]
Concept Activation Vectors (CAVs) can identify whether a model learned a concept or not.<n>FastCAV is a novel approach that accelerates the extraction of CAVs by up to 63.6x (on average 46.4x)
arXiv Detail & Related papers (2025-05-23T13:31:54Z) - Interpretable 3D Neural Object Volumes for Robust Conceptual Reasoning [68.3379650993108]
CAVE - Concept Aware Volumes for Explanations - is a new direction that unifies interpretability and robustness in image classification.<n>We propose 3D Consistency (3D-C), a metric to measure spatial consistency of concepts.<n>CAVE achieves competitive classification performance while discovering consistent and meaningful concepts across images in various OOD settings.
arXiv Detail & Related papers (2025-03-17T17:55:15Z) - Post-Hoc Concept Disentanglement: From Correlated to Isolated Concept Representations [12.072112471560716]
Concept Activation Vectors (CAVs) are widely used to model human-understandable concepts.<n>They are trained by identifying directions from the activations of concept samples to those of non-concept samples.<n>This method produces similar, non-orthogonal directions for correlated concepts, such as "beard" and "necktie"<n>This entanglement complicates the interpretation of concepts in isolation and can lead to undesired effects in CAV applications.
arXiv Detail & Related papers (2025-03-07T15:45:43Z) - Enhancing Graph Contrastive Learning with Reliable and Informative Augmentation for Recommendation [84.45144851024257]
We propose a novel framework that aims to enhance graph contrastive learning by constructing contrastive views with stronger collaborative information via discrete codes.<n>The core idea is to map users and items into discrete codes rich in collaborative information for reliable and informative contrastive view generation.
arXiv Detail & Related papers (2024-09-09T14:04:17Z) - Visual Concept Connectome (VCC): Open World Concept Discovery and their Interlayer Connections in Deep Models [21.245185285617698]
Visual Concept Connectome (VCC) discovers human interpretable concepts and their interlayer connections in a fully unsupervised manner.
Our approach simultaneously reveals fine-grained concepts at a layer, connection weightings across all layers and is amendable to global analysis of network structure.
arXiv Detail & Related papers (2024-04-02T18:40:55Z) - Exploring Concept Contribution Spatially: Hidden Layer Interpretation
with Spatial Activation Concept Vector [5.873416857161077]
Testing with Concept Activation Vector (TCAV) presents a powerful tool to quantify the contribution of query concepts to a target class.
For some images where the target object only occupies a small fraction of the region, TCAV evaluation may be interfered with by redundant background features.
arXiv Detail & Related papers (2022-05-21T15:58:57Z) - Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence [13.618809162030486]
Concept Activation Vectors (CAVs) have emerged as a popular tool for modeling human-understandable concepts in the latent space.<n>In this paper we show that such a separability-oriented leads to solutions, which may diverge from the actual goal of precisely modeling the concept direction.<n>We introduce pattern-based CAVs, solely focussing on concept signals, thereby providing more accurate concept directions.
arXiv Detail & Related papers (2022-02-07T19:40:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.