Related papers: The Geometry of Categorical and Hierarchical Concepts in Large Language Models

The Geometry of Categorical and Hierarchical Concepts in Large Language Models

URL: http://arxiv.org/abs/2406.01506v1
Date: Mon, 3 Jun 2024 16:34:01 GMT
Title: The Geometry of Categorical and Hierarchical Concepts in Large Language Models
Authors: Kiho Park, Yo Joong Choe, Yibo Jiang, Victor Veitch,
Abstract summary: We study the two foundational questions in this area. How are categorical concepts, such as'mammal', 'bird','reptile', 'fish', represented? For example, how is the fact that 'dog' is a kind of'mammal' encoded?
Score: 15.126806053878855
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Understanding how semantic meaning is encoded in the representation spaces of large language models is a fundamental problem in interpretability. In this paper, we study the two foundational questions in this area. First, how are categorical concepts, such as {'mammal', 'bird', 'reptile', 'fish'}, represented? Second, how are hierarchical relations between concepts encoded? For example, how is the fact that 'dog' is a kind of 'mammal' encoded? We show how to extend the linear representation hypothesis to answer these questions. We find a remarkably simple structure: simple categorical concepts are represented as simplices, hierarchically related concepts are orthogonal in a sense we make precise, and (in consequence) complex concepts are represented as polytopes constructed from direct sums of simplices, reflecting the hierarchical structure. We validate these theoretical results on the Gemma large language model, estimating representations for 957 hierarchically related concepts using data from WordNet.

Related papers

The Origins of Representation Manifolds in Large Language Models [52.68554895844062]
We show that cosine similarity in representation space may encode the intrinsic geometry of a feature through shortest, on-manifold paths.<n>The critical assumptions and predictions of the theory are validated on text embeddings and token activations of large language models.
arXiv Detail & Related papers (2025-05-23T13:31:22Z)
A Complexity-Based Theory of Compositionality [53.025566128892066]
In AI, compositional representations can enable a powerful form of out-of-distribution generalization. Here, we propose a formal definition of compositionality that accounts for and extends our intuitions about compositionality. The definition is conceptually simple, quantitative, grounded in algorithmic information theory, and applicable to any representation.
arXiv Detail & Related papers (2024-10-18T18:37:27Z)
On the Origins of Linear Representations in Large Language Models [51.88404605700344]
We introduce a simple latent variable model to formalize the concept dynamics of the next token prediction. Experiments show that linear representations emerge when learning from data matching the latent variable model. We additionally confirm some predictions of the theory using the LLaMA-2 large language model.
arXiv Detail & Related papers (2024-03-06T17:17:36Z)
An Axiomatic Approach to Model-Agnostic Concept Explanations [67.84000759813435]
We propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity. We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings.
arXiv Detail & Related papers (2024-01-12T20:53:35Z)
The Linear Representation Hypothesis and the Geometry of Large Language Models [12.387530469788738]
Informally, the 'linear representation hypothesis' is the idea that high-level concepts are represented linearly as directions in some representation space. This paper addresses two closely related questions: What does "linear representation" actually mean? We show how to unify all notions of linear representation using counterfactual pairs.
arXiv Detail & Related papers (2023-11-07T01:59:11Z)
Meaning Representations from Trajectories in Autoregressive Models [106.63181745054571]
We propose to extract meaning representations from autoregressive language models by considering the distribution of all possible trajectories extending an input text. This strategy is prompt-free, does not require fine-tuning, and is applicable to any pre-trained autoregressive model. We empirically show that the representations obtained from large models align well with human annotations, outperform other zero-shot and prompt-free methods on semantic similarity tasks, and can be used to solve more complex entailment and containment tasks that standard embeddings cannot handle.
arXiv Detail & Related papers (2023-10-23T04:35:58Z)
A Geometric Notion of Causal Probing [85.49839090913515]
The linear subspace hypothesis states that, in a language model's representation space, all information about a concept such as verbal number is encoded in a linear subspace. We give a set of intrinsic criteria which characterize an ideal linear concept subspace. We find that, for at least one concept across two languages models, the concept subspace can be used to manipulate the concept value of the generated word with precision.
arXiv Detail & Related papers (2023-07-27T17:57:57Z)
Concept Algebra for (Score-Based) Text-Controlled Generative Models [27.725860408234478]
This paper concerns the structure of learned representations in text-guided generative models. A key property of such models is that they can compose disparate concepts in a disentangled' manner. Here, we focus on the idea that concepts are encoded as subspaces of some representation space.
arXiv Detail & Related papers (2023-02-07T20:43:48Z)
Concept Gradient: Concept-based Interpretation Without Linear Assumption [77.96338722483226]
Concept Activation Vector (CAV) relies on learning a linear relation between some latent representation of a given model and concepts. We proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions. We demonstrated CG outperforms CAV in both toy examples and real world datasets.
arXiv Detail & Related papers (2022-08-31T17:06:46Z)
Analyzing Encoded Concepts in Transformer Language Models [21.76062029833023]
ConceptX analyses how latent concepts are encoded in representations learned within pre-trained language models. It uses clustering to discover the encoded concepts and explains them by aligning with a large set of human-defined concepts.
arXiv Detail & Related papers (2022-06-27T13:32:10Z)
Formalising Concepts as Grounded Abstractions [68.24080871981869]
This report shows how representation learning can be used to induce concepts from raw data. The main technical goal of this report is to show how techniques from representation learning can be married with a lattice-theoretic formulation of conceptual spaces.
arXiv Detail & Related papers (2021-01-13T15:22:01Z)
Space of Reasons and Mathematical Model [8.475081627511166]
Inferential relations govern our concept use. In order to understand a concept it has to be located in a space of implications. The crucial questions is: How can the conditionality of language use be represented.
arXiv Detail & Related papers (2020-07-06T01:13:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.