On the Origins of Linear Representations in Large Language Models
- URL: http://arxiv.org/abs/2403.03867v1
- Date: Wed, 6 Mar 2024 17:17:36 GMT
- Title: On the Origins of Linear Representations in Large Language Models
- Authors: Yibo Jiang, Goutham Rajendran, Pradeep Ravikumar, Bryon Aragam, Victor
Veitch
- Abstract summary: We introduce a simple latent variable model to formalize the concept dynamics of the next token prediction.
Experiments show that linear representations emerge when learning from data matching the latent variable model.
We additionally confirm some predictions of the theory using the LLaMA-2 large language model.
- Score: 51.88404605700344
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works have argued that high-level semantic concepts are encoded
"linearly" in the representation space of large language models. In this work,
we study the origins of such linear representations. To that end, we introduce
a simple latent variable model to abstract and formalize the concept dynamics
of the next token prediction. We use this formalism to show that the next token
prediction objective (softmax with cross-entropy) and the implicit bias of
gradient descent together promote the linear representation of concepts.
Experiments show that linear representations emerge when learning from data
matching the latent variable model, confirming that this simple structure
already suffices to yield linear representations. We additionally confirm some
predictions of the theory using the LLaMA-2 large language model, giving
evidence that the simplified model yields generalizable insights.
Related papers
- The Geometry of Categorical and Hierarchical Concepts in Large Language Models [15.126806053878855]
We show how to extend the formalization of the linear representation hypothesis to represent features (e.g., is_animal) as vectors.
We use the formalization to prove a relationship between the hierarchical structure of concepts and the geometry of their representations.
We validate these theoretical results on the Gemma and LLaMA-3 large language models, estimating representations for 900+ hierarchically related concepts using data from WordNet.
arXiv Detail & Related papers (2024-06-03T16:34:01Z) - An Axiomatic Approach to Model-Agnostic Concept Explanations [67.84000759813435]
We propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity.
We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings.
arXiv Detail & Related papers (2024-01-12T20:53:35Z) - Emergent Linear Representations in World Models of Self-Supervised
Sequence Models [5.712566125397807]
Othello-playing neural network learned nonlinear models of the board state.
We show that probing for "my colour" vs. "opponent's colour" may be a simple yet powerful way to interpret the model's internal state.
arXiv Detail & Related papers (2023-09-02T13:37:34Z) - Representer Point Selection for Explaining Regularized High-dimensional
Models [105.75758452952357]
We introduce a class of sample-based explanations we term high-dimensional representers.
Our workhorse is a novel representer theorem for general regularized high-dimensional models.
We study the empirical performance of our proposed methods on three real-world binary classification datasets and two recommender system datasets.
arXiv Detail & Related papers (2023-05-31T16:23:58Z) - Learning with Explanation Constraints [91.23736536228485]
We provide a learning theoretic framework to analyze how explanations can improve the learning of our models.
We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.
arXiv Detail & Related papers (2023-03-25T15:06:47Z) - BELIEF in Dependence: Leveraging Atomic Linearity in Data Bits for
Rethinking Generalized Linear Models [6.435660232678891]
We develop a framework called binary expansion linear effect (BELIEF) for understanding arbitrary relationships with a binary outcome.
Models from the BELIEF framework are easily interpretable because they describe the association of binary variables in the language of linear models.
arXiv Detail & Related papers (2022-10-19T19:28:09Z) - Linear Disentangled Representations and Unsupervised Action Estimation [2.793095554369282]
We show that linear disentangled representations are not generally present in standard VAE models.
We propose a method to induce irreducible representations which forgoes the need for labelled action sequences.
arXiv Detail & Related papers (2020-08-18T13:23:57Z) - Explainable Matrix -- Visualization for Global and Local
Interpretability of Random Forest Classification Ensembles [78.6363825307044]
We propose Explainable Matrix (ExMatrix), a novel visualization method for Random Forest (RF) interpretability.
It employs a simple yet powerful matrix-like visual metaphor, where rows are rules, columns are features, and cells are rules predicates.
ExMatrix applicability is confirmed via different examples, showing how it can be used in practice to promote RF models interpretability.
arXiv Detail & Related papers (2020-05-08T21:03:48Z) - Learning Bijective Feature Maps for Linear ICA [73.85904548374575]
We show that existing probabilistic deep generative models (DGMs) which are tailor-made for image data, underperform on non-linear ICA tasks.
To address this, we propose a DGM which combines bijective feature maps with a linear ICA model to learn interpretable latent structures for high-dimensional data.
We create models that converge quickly, are easy to train, and achieve better unsupervised latent factor discovery than flow-based models, linear ICA, and Variational Autoencoders on images.
arXiv Detail & Related papers (2020-02-18T17:58:07Z) - An interpretable neural network model through piecewise linear
approximation [7.196650216279683]
We propose a hybrid interpretable model that combines a piecewise linear component and a nonlinear component.
The first component describes the explicit feature contributions by piecewise linear approximation to increase the expressiveness of the model.
The other component uses a multi-layer perceptron to capture feature interactions and implicit nonlinearity, and increase the prediction performance.
arXiv Detail & Related papers (2020-01-20T14:32:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.