Related papers: Open Vocabulary Compositional Explanations for Neuron Alignment

Open Vocabulary Compositional Explanations for Neuron Alignment

URL: http://arxiv.org/abs/2511.20931v1
Date: Tue, 25 Nov 2025 23:45:37 GMT
Title: Open Vocabulary Compositional Explanations for Neuron Alignment
Authors: Biagio La Rosa, Leilani H. Gilpin,
Abstract summary: Motivated by the goal of understanding how neurons encode information, compositional explanations leverage logical relationships between concepts to express the spatial alignment between neuron activations and human knowledge.<n>This paper introduces a framework for the vision domain that allows users to probe neurons for arbitrary concepts and datasets.
Score: 4.497600020881818
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Neurons are the fundamental building blocks of deep neural networks, and their interconnections allow AI to achieve unprecedented results. Motivated by the goal of understanding how neurons encode information, compositional explanations leverage logical relationships between concepts to express the spatial alignment between neuron activations and human knowledge. However, these explanations rely on human-annotated datasets, restricting their applicability to specific domains and predefined concepts. This paper addresses this limitation by introducing a framework for the vision domain that allows users to probe neurons for arbitrary concepts and datasets. Specifically, the framework leverages masks generated by open vocabulary semantic segmentation to compute open vocabulary compositional explanations. The proposed framework consists of three steps: specifying arbitrary concepts, generating semantic segmentation masks using open vocabulary models, and deriving compositional explanations from these masks. The paper compares the proposed framework with previous methods for computing compositional explanations both in terms of quantitative metrics and human interpretability, analyzes the differences in explanations when shifting from human-annotated data to model-annotated data, and showcases the additional capabilities provided by the framework in terms of flexibility of the explanations with respect to the tasks and properties of interest.

Related papers

From Words to Waves: Analyzing Concept Formation in Speech and Text-Based Foundation Models [20.244145418997377]
We analyze the conceptual structures learned by speech and textual models both individually and jointly.<n>We employ Latent Concept Analysis, an unsupervised method for uncovering latent representations in neural networks, to examine how semantic abstractions form across modalities.
arXiv Detail & Related papers (2025-06-01T19:33:21Z)
Concept-Guided Interpretability via Neural Chunking [64.6429903327095]
We show that neural networks exhibit patterns in their raw population activity that mirror regularities in the training data.<n>We propose three methods to extract recurring chunks on a neural population level.<n>Our work points to a new direction for interpretability, one that harnesses both cognitive principles and the structure of naturalistic data.
arXiv Detail & Related papers (2025-05-16T13:49:43Z)
Neuro-Symbolic Concepts [72.94541757514396]
This article presents a concept-centric paradigm for building agents that can learn continually and reason flexibly.<n>The concept-centric agent utilizes a vocabulary of neuro-symbolic concepts.<n>This framework offers several advantages, including data efficiency, compositional generalization, continual learning, and zero-shot transfer.
arXiv Detail & Related papers (2025-05-09T17:02:51Z)
Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities.<n>We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities.<n>We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
CoSy: Evaluating Textual Explanations of Neurons [5.696573924249008]
We introduce CoSy, a framework for evaluating textual explanations of latent neurons.<n>By comparing the neuron's response to generated data points and control data points, we can estimate the quality of the explanation.<n>We validate our framework through sanity checks and benchmark various neuron description methods for Computer Vision tasks.
arXiv Detail & Related papers (2024-05-30T17:59:04Z)
Semantic Parsing for Question Answering over Knowledge Graphs [6.476654097130567]
We propose a novel method for question answering over knowledge graphs based on graph-to-segment mapping.<n>Our framework integrates both rule-based and neural methods to parse and construct accurate semantic segment sequences.<n>We formulate question semantic parsing as a sequence generation task, employing an encoder-decoder neural network to map natural language questions into semantic segments.
arXiv Detail & Related papers (2023-12-01T20:45:06Z)
Mapping Knowledge Representations to Concepts: A Review and New Perspectives [0.6875312133832078]
This review focuses on research that aims to associate internal representations with human understandable concepts. We find this taxonomy and theories of causality, useful for understanding what can be expected, and not expected, from neural network explanations. The analysis additionally uncovers an ambiguity in the reviewed literature related to the goal of model explainability.
arXiv Detail & Related papers (2022-12-31T12:56:12Z)
Low-Dimensional Structure in the Space of Language Representations is Reflected in Brain Responses [62.197912623223964]
We show a low-dimensional structure where language models and translation models smoothly interpolate between word embeddings, syntactic and semantic tasks, and future word embeddings. We find that this representation embedding can predict how well each individual feature space maps to human brain responses to natural language stimuli recorded using fMRI. This suggests that the embedding captures some part of the brain's natural language representation structure.
arXiv Detail & Related papers (2021-06-09T22:59:12Z)
Compositional Processing Emerges in Neural Networks Solving Math Problems [100.80518350845668]
Recent progress in artificial neural networks has shown that when large models are trained on enough linguistic data, grammatical structure emerges in their representations. We extend this work to the domain of mathematical reasoning, where it is possible to formulate precise hypotheses about how meanings should be composed. Our work shows that neural networks are not only able to infer something about the structured relationships implicit in their training data, but can also deploy this knowledge to guide the composition of individual meanings into composite wholes.
arXiv Detail & Related papers (2021-05-19T07:24:42Z)
A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics [131.93113552146195]
We present a new dataset, Handwritten arithmetic with INTegers (HINT), to examine machines' capability of learning generalizable concepts. In HINT, machines are tasked with learning how concepts are perceived from raw signals such as images. We undertake extensive experiments with various sequence-to-sequence models, including RNNs, Transformers, and GPT-3.
arXiv Detail & Related papers (2021-03-02T01:32:54Z)
Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts. We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)
Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection [21.02924712220406]
We build hierarchical explanations by detecting feature interactions. Such explanations visualize how words and phrases are combined at different levels of the hierarchy. Experiments show the effectiveness of the proposed method in providing explanations both faithful to models and interpretable to humans.
arXiv Detail & Related papers (2020-04-04T20:56:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.