Related papers: On the Performance of Concept Probing: The Influence of the Data (Extended Version)

On the Performance of Concept Probing: The Influence of the Data (Extended Version)

URL: http://arxiv.org/abs/2507.18550v1
Date: Thu, 24 Jul 2025 16:18:46 GMT
Title: On the Performance of Concept Probing: The Influence of the Data (Extended Version)
Authors: Manuel de Sousa Ribeiro, Afonso Leote, João Leite,
Abstract summary: Concept probing works by training additional classifiers to map the internal representations of a model into human-defined concepts of interest.<n>Research on concept probing has mainly focused on the model being probed or the probing model itself.<n>In this paper, we investigate the effect of the data used to train probing models on their performance.
Score: 3.2443914909457594
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Concept probing has recently garnered increasing interest as a way to help interpret artificial neural networks, dealing both with their typically large size and their subsymbolic nature, which ultimately renders them unfeasible for direct human interpretation. Concept probing works by training additional classifiers to map the internal representations of a model into human-defined concepts of interest, thus allowing humans to peek inside artificial neural networks. Research on concept probing has mainly focused on the model being probed or the probing model itself, paying limited attention to the data required to train such probing models. In this paper, we address this gap. Focusing on concept probing in the context of image classification tasks, we investigate the effect of the data used to train probing models on their performance. We also make available concept labels for two widely used datasets.

Related papers

Concept Probing: Where to Find Human-Defined Concepts (Extended Version) [3.2443914909457594]
We propose a method to automatically identify which layer's representations in a neural network model should be considered when probing for a given human-defined concept of interest.<n>We validate our findings through an exhaustive empirical analysis over different neural network models and datasets.
arXiv Detail & Related papers (2025-07-24T16:30:10Z)
Concept-Guided Interpretability via Neural Chunking [54.73787666584143]
We show that neural networks exhibit patterns in their raw population activity that mirror regularities in the training data.<n>We propose three methods to extract these emerging entities, complementing each other based on label availability and dimensionality.<n>Our work points to a new direction for interpretability, one that harnesses both cognitive principles and the structure of naturalistic data.
arXiv Detail & Related papers (2025-05-16T13:49:43Z)
Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models [65.82564074712836]
We introduce DIFfusionHOI, a new HOI detector shedding light on text-to-image diffusion models. We first devise an inversion-based strategy to learn the expression of relation patterns between humans and objects in embedding space. These learned relation embeddings then serve as textual prompts, to steer diffusion models generate images that depict specific interactions.
arXiv Detail & Related papers (2024-10-26T12:00:33Z)
Automatic Discovery of Visual Circuits [66.99553804855931]
We explore scalable methods for extracting the subgraph of a vision model's computational graph that underlies recognition of a specific visual concept. We find that our approach extracts circuits that causally affect model output, and that editing these circuits can defend large pretrained models from adversarial attacks.
arXiv Detail & Related papers (2024-04-22T17:00:57Z)
Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration. In this work, we tackle the task of reconstructing closely interactive humans from a monocular video. We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z)
Attributing Learned Concepts in Neural Networks to Training Data [5.930268338525991]
We find evidence for convergence, where removing the 10,000 top attributing images for a concept and retraining the model does not change the location of the concept in the network. This suggests that the features that inform the development of a concept are spread in a more diffuse manner across its exemplars, implying robustness in concept formation.
arXiv Detail & Related papers (2023-10-04T20:26:59Z)
Hierarchical Semantic Tree Concept Whitening for Interpretable Image Classification [19.306487616731765]
Post-hoc analysis can only discover the patterns or rules that naturally exist in models. We proactively instill knowledge to alter the representation of human-understandable concepts in hidden layers. Our method improves model interpretability, showing better disentanglement of semantic concepts, without negatively affecting model classification performance.
arXiv Detail & Related papers (2023-07-10T04:54:05Z)
On Modifying a Neural Network's Perception [3.42658286826597]
We propose a method which allows one to modify what an artificial neural network is perceiving regarding specific human-defined concepts. We test the proposed method on different models, assessing whether the performed manipulations are well interpreted by the models, and analyzing how they react to them.
arXiv Detail & Related papers (2023-03-05T12:09:37Z)
Explainable Adversarial Attacks in Deep Neural Networks Using Activation Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples. We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z)
Human-Understandable Decision Making for Visual Recognition [30.30163407674527]
We propose a new framework to train a deep neural network by incorporating the prior of human perception into the model learning process. The effectiveness of our proposed model is evaluated on two classical visual recognition tasks.
arXiv Detail & Related papers (2021-03-05T02:07:33Z)
Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning. It aims to extract both the common information and the complementary information in an adversarial setting. In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.