Related papers: FaCT: Faithful Concept Traces for Explaining Neural Network Decisions

FaCT: Faithful Concept Traces for Explaining Neural Network Decisions

URL: http://arxiv.org/abs/2510.25512v1
Date: Wed, 29 Oct 2025 13:35:46 GMT
Title: FaCT: Faithful Concept Traces for Explaining Neural Network Decisions
Authors: Amin Parchami-Araghi, Sukrut Rao, Jonas Fischer, Bernt Schiele,
Abstract summary: Deep networks have shown remarkable performance across a wide range of tasks, yet getting a global concept-level understanding of how they function remains a key challenge.<n>We put emphasis on the faithfulness of concept-based explanations and propose a new model with model-inherent mechanistic concept-explanations.<n>Our concepts are shared across classes and, from any layer, their contribution to the logit and their input-visualization can be faithfully traced.
Score: 56.796533084868884
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep networks have shown remarkable performance across a wide range of tasks, yet getting a global concept-level understanding of how they function remains a key challenge. Many post-hoc concept-based approaches have been introduced to understand their workings, yet they are not always faithful to the model. Further, they make restrictive assumptions on the concepts a model learns, such as class-specificity, small spatial extent, or alignment to human expectations. In this work, we put emphasis on the faithfulness of such concept-based explanations and propose a new model with model-inherent mechanistic concept-explanations. Our concepts are shared across classes and, from any layer, their contribution to the logit and their input-visualization can be faithfully traced. We also leverage foundation models to propose a new concept-consistency metric, C$^2$-Score, that can be used to evaluate concept-based methods. We show that, compared to prior work, our concepts are quantitatively more consistent and users find our concepts to be more interpretable, all while retaining competitive ImageNet performance.

Related papers

Insight: Interpretable Semantic Hierarchies in Vision-Language Encoders [52.94006363830628]
Language-aligned vision foundation models perform strongly across diverse downstream tasks.<n>Recent works decompose these representations into human-interpretable concepts, but provide poor spatial grounding and are limited to image classification tasks.<n>We propose Insight, a language-aligned concept foundation model that provides fine-grained concepts, which are human-interpretable and spatially grounded in the input image.
arXiv Detail & Related papers (2026-01-20T09:57:26Z)
Towards Better Generalization and Interpretability in Unsupervised Concept-Based Models [9.340843984411137]
This paper introduces a novel unsupervised concept-based model for image classification, named Learnable Concept-Based Model (LCBM)<n>We demonstrate that LCBM surpasses existing unsupervised concept-based models in generalization capability and nearly matches the performance of black-box models.<n>Despite the use of concept embeddings, we maintain model interpretability by means of a local linear combination of concepts.
arXiv Detail & Related papers (2025-06-02T16:26:41Z)
Walking the Web of Concept-Class Relationships in Incrementally Trained Interpretable Models [25.84386438333865]
We show that concepts and classes form a complex web of relationships, which is susceptible to degradation and needs to be preserved and augmented across experiences.<n>We propose a novel method - MuCIL - that uses multimodal concepts to perform classification without increasing the number of trainable parameters across experiences.
arXiv Detail & Related papers (2025-02-27T18:59:29Z)
OmniPrism: Learning Disentangled Visual Concept for Image Generation [57.21097864811521]
Creative visual concept generation often draws inspiration from specific concepts in a reference image to produce relevant outcomes.<n>We propose OmniPrism, a visual concept disentangling approach for creative image generation.<n>Our method learns disentangled concept representations guided by natural language and trains a diffusion model to incorporate these concepts.
arXiv Detail & Related papers (2024-12-16T18:59:52Z)
Restyling Unsupervised Concept Based Interpretable Networks with Generative Models [14.604305230535026]
We propose a novel method that relies on mapping the concept features to the latent space of a pretrained generative model.<n>We quantitatively ascertain the efficacy of our method in terms of accuracy of the interpretable prediction network, fidelity of reconstruction, as well as faithfulness and consistency of learnt concepts.
arXiv Detail & Related papers (2024-07-01T14:39:41Z)
A Self-explaining Neural Architecture for Generalizable Concept Learning [29.932706137805713]
We show that present SOTA concept learning approaches suffer from two major problems - lack of concept fidelity and limited concept interoperability. We propose a novel self-explaining architecture for concept learning across domains. We demonstrate the efficacy of our proposed approach over current SOTA concept learning approaches on four widely used real-world datasets.
arXiv Detail & Related papers (2024-05-01T06:50:18Z)
A survey on Concept-based Approaches For Model Improvement [2.1516043775965565]
Concepts are known to be the thinking ground of humans. We provide a systematic review and taxonomy of various concept representations and their discovery algorithms in Deep Neural Networks (DNNs) We also provide details on concept-based model improvement literature marking the first comprehensive survey of these methods.
arXiv Detail & Related papers (2024-03-21T17:09:20Z)
Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation. We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z)
Coarse-to-Fine Concept Bottleneck Models [9.910980079138206]
This work targets ante hoc interpretability, and specifically Concept Bottleneck Models (CBMs) Our goal is to design a framework that admits a highly interpretable decision making process with respect to human understandable concepts, on two levels of granularity. Within this framework, concept information does not solely rely on the similarity between the whole image and general unstructured concepts; instead, we introduce the notion of concept hierarchy to uncover and exploit more granular concept information residing in patch-specific regions of the image scene.
arXiv Detail & Related papers (2023-10-03T14:57:31Z)
Concept Gradient: Concept-based Interpretation Without Linear Assumption [77.96338722483226]
Concept Activation Vector (CAV) relies on learning a linear relation between some latent representation of a given model and concepts. We proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions. We demonstrated CG outperforms CAV in both toy examples and real world datasets.
arXiv Detail & Related papers (2022-08-31T17:06:46Z)
Human-Centered Concept Explanations for Neural Networks [47.71169918421306]
We introduce concept explanations including the class of Concept Activation Vectors (CAV) We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats. Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.
arXiv Detail & Related papers (2022-02-25T01:27:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.