Concept Boundary Vectors
- URL: http://arxiv.org/abs/2412.15698v1
- Date: Fri, 20 Dec 2024 09:18:11 GMT
- Title: Concept Boundary Vectors
- Authors: Thomas Walker,
- Abstract summary: We introduce concept boundary vectors as a concept vector construction derived from the boundary between the latent representations of concepts.<n> Empirically we demonstrate that concept boundary vectors capture a concept's semantic meaning, and we compare their effectiveness against concept activation vectors.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning models are trained with relatively simple objectives, such as next token prediction. However, on deployment, they appear to capture a more fundamental representation of their input data. It is of interest to understand the nature of these representations to help interpret the model's outputs and to identify ways to improve the salience of these representations. Concept vectors are constructions aimed at attributing concepts in the input data to directions, represented by vectors, in the model's latent space. In this work, we introduce concept boundary vectors as a concept vector construction derived from the boundary between the latent representations of concepts. Empirically we demonstrate that concept boundary vectors capture a concept's semantic meaning, and we compare their effectiveness against concept activation vectors.
Related papers
- Post-Hoc Concept Disentanglement: From Correlated to Isolated Concept Representations [12.072112471560716]
Concept Activation Vectors (CAVs) are widely used to model human-understandable concepts.
They are trained by identifying directions from the activations of concept samples to those of non-concept samples.
This method produces similar, non-orthogonal directions for correlated concepts, such as "beard" and "necktie"
This entanglement complicates the interpretation of concepts in isolation and can lead to undesired effects in CAV applications.
arXiv Detail & Related papers (2025-03-07T15:45:43Z) - Identifying Linear Relational Concepts in Large Language Models [16.917379272022064]
Transformer language models (LMs) have been shown to represent concepts as directions in the latent space of hidden activations.
We present a technique called linear relational concepts (LRC) for finding concept directions corresponding to human-interpretable concepts.
arXiv Detail & Related papers (2023-11-15T14:01:41Z) - Simple Mechanisms for Representing, Indexing and Manipulating Concepts [46.715152257557804]
We will argue that learning a concept could be done by looking at its moment statistics matrix to generate a concrete representation or signature of that concept.
When the concepts are intersected', signatures of the concepts can be used to find a common theme across a number of related intersected' concepts.
arXiv Detail & Related papers (2023-10-18T17:54:29Z) - Distilling Semantic Concept Embeddings from Contrastively Fine-Tuned
Language Models [39.514266900301294]
Current strategies for using language models typically represent a concept by averaging the contextualised representations of its mentions in some corpus.
We propose two contrastive learning strategies, based on the view that whenever two sentences reveal similar properties, the corresponding contextualised vectors should also be similar.
One strategy is fully unsupervised, estimating the properties which are expressed in a sentence from the neighbourhood structure of the contextualised word embeddings.
The second strategy instead relies on a distant supervision signal from ConceptNet.
arXiv Detail & Related papers (2023-05-16T20:17:02Z) - Unsupervised Interpretable Basis Extraction for Concept-Based Visual
Explanations [53.973055975918655]
We show that, intermediate layer representations become more interpretable when transformed to the bases extracted with our method.
We compare the bases extracted with our method with the bases derived with a supervised approach and find that, in one aspect, the proposed unsupervised approach has a strength that constitutes a limitation of the supervised one and give potential directions for future research.
arXiv Detail & Related papers (2023-03-19T00:37:19Z) - Linear Spaces of Meanings: Compositional Structures in Vision-Language
Models [110.00434385712786]
We investigate compositional structures in data embeddings from pre-trained vision-language models (VLMs)
We first present a framework for understanding compositional structures from a geometric perspective.
We then explain what these structures entail probabilistically in the case of VLM embeddings, providing intuitions for why they arise in practice.
arXiv Detail & Related papers (2023-02-28T08:11:56Z) - Towards a Deeper Understanding of Concept Bottleneck Models Through
End-to-End Explanation [2.9740255333669454]
Concept Bottleneck Models (CBMs) first map raw input(s) to a vector of human-defined concepts, before using this vector to predict a final classification.
In doing so, this would support human interpretation when generating explanations of the model's outputs to visualise input features corresponding to concepts.
arXiv Detail & Related papers (2023-02-07T16:43:43Z) - Concept Activation Regions: A Generalized Framework For Concept-Based
Explanations [95.94432031144716]
Existing methods assume that the examples illustrating a concept are mapped in a fixed direction of the deep neural network's latent space.
In this work, we propose allowing concept examples to be scattered across different clusters in the DNN's latent space.
This concept activation region (CAR) formalism yields global concept-based explanations and local concept-based feature importance.
arXiv Detail & Related papers (2022-09-22T17:59:03Z) - Concept Gradient: Concept-based Interpretation Without Linear Assumption [77.96338722483226]
Concept Activation Vector (CAV) relies on learning a linear relation between some latent representation of a given model and concepts.
We proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions.
We demonstrated CG outperforms CAV in both toy examples and real world datasets.
arXiv Detail & Related papers (2022-08-31T17:06:46Z) - Human-Centered Concept Explanations for Neural Networks [47.71169918421306]
We introduce concept explanations including the class of Concept Activation Vectors (CAV)
We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats.
Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.
arXiv Detail & Related papers (2022-02-25T01:27:31Z) - Interpretable Visual Reasoning via Induced Symbolic Space [75.95241948390472]
We study the problem of concept induction in visual reasoning, i.e., identifying concepts and their hierarchical relationships from question-answer pairs associated with images.
We first design a new framework named object-centric compositional attention model (OCCAM) to perform the visual reasoning task with object-level visual features.
We then come up with a method to induce concepts of objects and relations using clues from the attention patterns between objects' visual features and question words.
arXiv Detail & Related papers (2020-11-23T18:21:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.