Related papers: The Linear Representation Hypothesis and the Geometry of Large Language Models

The Linear Representation Hypothesis and the Geometry of Large Language Models

URL: http://arxiv.org/abs/2311.03658v2
Date: Wed, 17 Jul 2024 22:24:27 GMT
Title: The Linear Representation Hypothesis and the Geometry of Large Language Models
Authors: Kiho Park, Yo Joong Choe, Victor Veitch,
Abstract summary: Informally, the 'linear representation hypothesis' is the idea that high-level concepts are represented linearly as directions in some representation space. This paper addresses two closely related questions: What does "linear representation" actually mean? We show how to unify all notions of linear representation using counterfactual pairs.
Score: 12.387530469788738
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Informally, the 'linear representation hypothesis' is the idea that high-level concepts are represented linearly as directions in some representation space. In this paper, we address two closely related questions: What does "linear representation" actually mean? And, how do we make sense of geometric notions (e.g., cosine similarity or projection) in the representation space? To answer these, we use the language of counterfactuals to give two formalizations of "linear representation", one in the output (word) representation space, and one in the input (sentence) space. We then prove these connect to linear probing and model steering, respectively. To make sense of geometric notions, we use the formalization to identify a particular (non-Euclidean) inner product that respects language structure in a sense we make precise. Using this causal inner product, we show how to unify all notions of linear representation. In particular, this allows the construction of probes and steering vectors using counterfactual pairs. Experiments with LLaMA-2 demonstrate the existence of linear representations of concepts, the connection to interpretation and control, and the fundamental role of the choice of inner product.

Related papers

The Origins of Representation Manifolds in Large Language Models [52.68554895844062]
We show that cosine similarity in representation space may encode the intrinsic geometry of a feature through shortest, on-manifold paths.<n>The critical assumptions and predictions of the theory are validated on text embeddings and token activations of large language models.
arXiv Detail & Related papers (2025-05-23T13:31:22Z)
A Complexity-Based Theory of Compositionality [53.025566128892066]
In AI, compositional representations can enable a powerful form of out-of-distribution generalization. Here, we propose a formal definition of compositionality that accounts for and extends our intuitions about compositionality. The definition is conceptually simple, quantitative, grounded in algorithmic information theory, and applicable to any representation.
arXiv Detail & Related papers (2024-10-18T18:37:27Z)
The Geometry of Categorical and Hierarchical Concepts in Large Language Models [15.126806053878855]
We show how to extend the formalization of the linear representation hypothesis to represent features (e.g., is_animal) as vectors. We use the formalization to prove a relationship between the hierarchical structure of concepts and the geometry of their representations. We validate these theoretical results on the Gemma and LLaMA-3 large language models, estimating representations for 900+ hierarchically related concepts using data from WordNet.
arXiv Detail & Related papers (2024-06-03T16:34:01Z)
Transport of Algebraic Structure to Latent Embeddings [8.693845596949892]
Machine learning often aims to produce latent embeddings of inputs which lie in a larger, abstract mathematical space. How can we learn to "union" two sets using only their latent embeddings while respecting associativity? We propose a general procedure for parameterizing latent space operations that are provably consistent with the laws on the input space.
arXiv Detail & Related papers (2024-05-27T02:24:57Z)
Inference via Interpolation: Contrastive Representations Provably Enable Planning and Inference [110.47649327040392]
Given time series data, how can we answer questions like "what will happen in the future?" and "how did we get here?" We show how these questions can have compact, closed form solutions in terms of learned representations.
arXiv Detail & Related papers (2024-03-06T22:27:30Z)
On the Origins of Linear Representations in Large Language Models [51.88404605700344]
We introduce a simple latent variable model to formalize the concept dynamics of the next token prediction. Experiments show that linear representations emerge when learning from data matching the latent variable model. We additionally confirm some predictions of the theory using the LLaMA-2 large language model.
arXiv Detail & Related papers (2024-03-06T17:17:36Z)
Understanding Probe Behaviors through Variational Bounds of Mutual Information [53.520525292756005]
We provide guidelines for linear probing by constructing a novel mathematical framework leveraging information theory. First, we connect probing with the variational bounds of mutual information (MI) to relax the probe design, equating linear probing with fine-tuning. We show that the intermediate representations can have the biggest MI estimate because of the tradeoff between better separability and decreasing MI.
arXiv Detail & Related papers (2023-12-15T18:38:18Z)
Quantum and Reality [0.0]
We describe a natural emergence of Hermiticity which is rooted in principles of equivariant homotopy theory. This construction of Hermitian forms requires of the ambient linear type theory nothing further than a negative unit term of tensor unit type. We show how this allows for encoding (and verifying) the unitarity of quantum gates and of quantum channels in quantum languages embedded into LHoTT.
arXiv Detail & Related papers (2023-11-18T11:00:12Z)
A Geometric Notion of Causal Probing [91.14470073637236]
In a language model's representation space, all information about a concept such as verbal number is encoded in a linear subspace. We give a set of intrinsic criteria which characterize an ideal linear concept subspace. We find that LEACE returns a one-dimensional subspace containing roughly half of total concept information.
arXiv Detail & Related papers (2023-07-27T17:57:57Z)
On the Complexity of Representation Learning in Contextual Linear Bandits [110.84649234726442]
We show that representation learning is fundamentally more complex than linear bandits. In particular, learning with a given set of representations is never simpler than learning with the worst realizable representation in the set.
arXiv Detail & Related papers (2022-12-19T13:08:58Z)
Should Semantic Vector Composition be Explicit? Can it be Linear [5.6031349532829955]
Vector representations have become a central element in semantic language modelling. How should the concept wet fish' be represented? This paper surveys this question from two points of view.
arXiv Detail & Related papers (2021-04-13T23:58:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.