Related papers: On the Origins of Linear Representations in Large Language Models

On the Origins of Linear Representations in Large Language Models

URL: http://arxiv.org/abs/2403.03867v1
Date: Wed, 6 Mar 2024 17:17:36 GMT
Title: On the Origins of Linear Representations in Large Language Models
Authors: Yibo Jiang, Goutham Rajendran, Pradeep Ravikumar, Bryon Aragam, Victor Veitch
Abstract summary: We introduce a simple latent variable model to formalize the concept dynamics of the next token prediction. Experiments show that linear representations emerge when learning from data matching the latent variable model. We additionally confirm some predictions of the theory using the LLaMA-2 large language model.
Score: 51.88404605700344
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent works have argued that high-level semantic concepts are encoded "linearly" in the representation space of large language models. In this work, we study the origins of such linear representations. To that end, we introduce a simple latent variable model to abstract and formalize the concept dynamics of the next token prediction. We use this formalism to show that the next token prediction objective (softmax with cross-entropy) and the implicit bias of gradient descent together promote the linear representation of concepts. Experiments show that linear representations emerge when learning from data matching the latent variable model, confirming that this simple structure already suffices to yield linear representations. We additionally confirm some predictions of the theory using the LLaMA-2 large language model, giving evidence that the simplified model yields generalizable insights.

Related papers

Linear Representation Transferability Hypothesis: Leveraging Small Models to Steer Large Models [6.390475802910619]
We show that representations learned across models trained on the same data can be expressed as linear combinations of a emphuniversal set of basis features.<n>These basis features underlie the learning task itself and remain consistent across models, regardless of scale.
arXiv Detail & Related papers (2025-05-31T17:45:18Z)
The Origins of Representation Manifolds in Large Language Models [52.68554895844062]
We show that cosine similarity in representation space may encode the intrinsic geometry of a feature through shortest, on-manifold paths.<n>The critical assumptions and predictions of the theory are validated on text embeddings and token activations of large language models.
arXiv Detail & Related papers (2025-05-23T13:31:22Z)
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [79.01538178959726]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence. We introduce a novel generative model that generates tokens on the basis of human interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z)
Solvable Dynamics of Self-Supervised Word Embeddings and the Emergence of Analogical Reasoning [3.519547280344187]
We study a class of solvable contrastive self-supervised algorithms which we term quadratic word embedding models. Our solutions reveal that these models learn linear subspaces one at a time, each one incrementing the effective rank of the embeddings until model capacity is saturated. We use our dynamical theory to predict how and when models acquire the ability to complete analogies.
arXiv Detail & Related papers (2025-02-14T02:16:48Z)
The Geometry of Categorical and Hierarchical Concepts in Large Language Models [15.126806053878855]
We show how to extend the formalization of the linear representation hypothesis to represent features (e.g., is_animal) as vectors. We use the formalization to prove a relationship between the hierarchical structure of concepts and the geometry of their representations. We validate these theoretical results on the Gemma and LLaMA-3 large language models, estimating representations for 900+ hierarchically related concepts using data from WordNet.
arXiv Detail & Related papers (2024-06-03T16:34:01Z)
An Axiomatic Approach to Model-Agnostic Concept Explanations [67.84000759813435]
We propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity. We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings.
arXiv Detail & Related papers (2024-01-12T20:53:35Z)
Emergent Linear Representations in World Models of Self-Supervised Sequence Models [5.712566125397807]
Othello-playing neural network learned nonlinear models of the board state. We show that probing for "my colour" vs. "opponent's colour" may be a simple yet powerful way to interpret the model's internal state.
arXiv Detail & Related papers (2023-09-02T13:37:34Z)
Representer Point Selection for Explaining Regularized High-dimensional Models [105.75758452952357]
We introduce a class of sample-based explanations we term high-dimensional representers. Our workhorse is a novel representer theorem for general regularized high-dimensional models. We study the empirical performance of our proposed methods on three real-world binary classification datasets and two recommender system datasets.
arXiv Detail & Related papers (2023-05-31T16:23:58Z)
Learning with Explanation Constraints [91.23736536228485]
We provide a learning theoretic framework to analyze how explanations can improve the learning of our models. We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.
arXiv Detail & Related papers (2023-03-25T15:06:47Z)
BELIEF in Dependence: Leveraging Atomic Linearity in Data Bits for Rethinking Generalized Linear Models [6.435660232678891]
We develop a framework called binary expansion linear effect (BELIEF) for understanding arbitrary relationships with a binary outcome. Models from the BELIEF framework are easily interpretable because they describe the association of binary variables in the language of linear models.
arXiv Detail & Related papers (2022-10-19T19:28:09Z)
Linear Disentangled Representations and Unsupervised Action Estimation [2.793095554369282]
We show that linear disentangled representations are not generally present in standard VAE models. We propose a method to induce irreducible representations which forgoes the need for labelled action sequences.
arXiv Detail & Related papers (2020-08-18T13:23:57Z)
Explainable Matrix -- Visualization for Global and Local Interpretability of Random Forest Classification Ensembles [78.6363825307044]
We propose Explainable Matrix (ExMatrix), a novel visualization method for Random Forest (RF) interpretability. It employs a simple yet powerful matrix-like visual metaphor, where rows are rules, columns are features, and cells are rules predicates. ExMatrix applicability is confirmed via different examples, showing how it can be used in practice to promote RF models interpretability.
arXiv Detail & Related papers (2020-05-08T21:03:48Z)
Learning Bijective Feature Maps for Linear ICA [73.85904548374575]
We show that existing probabilistic deep generative models (DGMs) which are tailor-made for image data, underperform on non-linear ICA tasks. To address this, we propose a DGM which combines bijective feature maps with a linear ICA model to learn interpretable latent structures for high-dimensional data. We create models that converge quickly, are easy to train, and achieve better unsupervised latent factor discovery than flow-based models, linear ICA, and Variational Autoencoders on images.
arXiv Detail & Related papers (2020-02-18T17:58:07Z)
An interpretable neural network model through piecewise linear approximation [7.196650216279683]
We propose a hybrid interpretable model that combines a piecewise linear component and a nonlinear component. The first component describes the explicit feature contributions by piecewise linear approximation to increase the expressiveness of the model. The other component uses a multi-layer perceptron to capture feature interactions and implicit nonlinearity, and increase the prediction performance.
arXiv Detail & Related papers (2020-01-20T14:32:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.