Function Vectors in Large Language Models
- URL: http://arxiv.org/abs/2310.15213v2
- Date: Sun, 25 Feb 2024 18:32:18 GMT
- Title: Function Vectors in Large Language Models
- Authors: Eric Todd, Millicent L. Li, Arnab Sen Sharma, Aaron Mueller, Byron C.
Wallace, David Bau
- Abstract summary: We report the presence of a simple neural mechanism that represents an input-output function as a vector within autoregressive transformer language models (LMs)
Using causal mediation analysis on a diverse range of in-context-learning (ICL) tasks, we find that a small number attention heads transport a compact representation of the demonstrated task, which we call a function vector (FV)
- Score: 45.267194267587435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We report the presence of a simple neural mechanism that represents an
input-output function as a vector within autoregressive transformer language
models (LMs). Using causal mediation analysis on a diverse range of
in-context-learning (ICL) tasks, we find that a small number attention heads
transport a compact representation of the demonstrated task, which we call a
function vector (FV). FVs are robust to changes in context, i.e., they trigger
execution of the task on inputs such as zero-shot and natural text settings
that do not resemble the ICL contexts from which they are collected. We test
FVs across a range of tasks, models, and layers and find strong causal effects
across settings in middle layers. We investigate the internal structure of FVs
and find while that they often contain information that encodes the output
space of the function, this information alone is not sufficient to reconstruct
an FV. Finally, we test semantic vector composition in FVs, and find that to
some extent they can be summed to create vectors that trigger new complex
tasks. Our findings show that compact, causal internal vector representations
of function abstractions can be explicitly extracted from LLMs. Our code and
data are available at https://functions.baulab.info.
Related papers
- Interpreting Attention Layer Outputs with Sparse Autoencoders [3.201633659481912]
Decomposing model activations into interpretable components is a key open problem in mechanistic interpretability.
In this work we train SAEs on attention layer outputs and show that also here SAEs find a sparse, interpretable decomposition.
We show that Sparse Autoencoders are a useful tool that enable researchers to explain model behavior in greater detail than prior work.
arXiv Detail & Related papers (2024-06-25T17:43:13Z) - Talking Heads: Understanding Inter-layer Communication in Transformer Language Models [32.2976613483151]
We find that transformer language models (LMs) pass features from early layers to later layers.
By analyzing particular mechanism LMs use to accomplish this, we find that it is also used to recall items from a list.
Our analysis reveals a surprisingly intricate interpretable structure learned from language model pretraining.
arXiv Detail & Related papers (2024-06-13T18:12:01Z) - How Do Transformers Learn In-Context Beyond Simple Functions? A Case
Study on Learning with Representations [98.7450564309923]
This paper takes initial steps on understanding in-context learning (ICL) in more complex scenarios, by studying learning with representations.
We construct synthetic in-context learning problems with a compositional structure, where the label depends on the input through a possibly complex but fixed representation function.
We show theoretically the existence of transformers that approximately implement such algorithms with mild depth and size.
arXiv Detail & Related papers (2023-10-16T17:40:49Z) - FIND: A Function Description Benchmark for Evaluating Interpretability
Methods [86.80718559904854]
This paper introduces FIND (Function INterpretation and Description), a benchmark suite for evaluating automated interpretability methods.
FIND contains functions that resemble components of trained neural networks, and accompanying descriptions of the kind we seek to generate.
We evaluate methods that use pretrained language models to produce descriptions of function behavior in natural language and code.
arXiv Detail & Related papers (2023-09-07T17:47:26Z) - Adapting Language Models to Compress Contexts [71.98287002918941]
Transformer-based language models (LMs) are powerful and widely-applicable tools, but their usefulness is constrained by a finite context window.
We propose to adapt pre-trained LMs into AutoCompressors, which are capable of compressing long contexts into compact summary vectors.
We fine-tune OPT and Llama-2 models on sequences of up to 30,720 tokens and show that AutoCompressors can utilize long contexts to improve perplexity.
arXiv Detail & Related papers (2023-05-24T06:42:44Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - Computing on Functions Using Randomized Vector Representations [4.066849397181077]
We call this new function encoding and computing framework Vector Function Architecture (VFA)
Our analyses and results suggest that VFAs constitute a powerful new framework for representing and manipulating functions in distributed neural systems.
arXiv Detail & Related papers (2021-09-08T04:39:48Z) - How LSTM Encodes Syntax: Exploring Context Vectors and Semi-Quantization
on Natural Text [2.881185491084005]
We learn a language model where syntactic structures are implicitly given.
We show that the context update vectors, i.e. outputs of internal gates, are approximately quantized to binary or ternary values.
For some dimensions in the context vector, we show that their activations are highly correlated with the depth of phrase structures.
We also show that natural clusters of the functional words and the part of speeches that trigger phrases are represented in a small but principal subspace of the context-update vector of LSTM.
arXiv Detail & Related papers (2020-10-01T12:49:01Z) - iffDetector: Inference-aware Feature Filtering for Object Detection [70.8678270164057]
We introduce a generic Inference-aware Feature Filtering (IFF) module that can easily be combined with modern detectors.
IFF performs closed-loop optimization by leveraging high-level semantics to enhance the convolutional features.
IFF can be fused with CNN-based object detectors in a plug-and-play manner with negligible computational cost overhead.
arXiv Detail & Related papers (2020-06-23T02:57:29Z) - On Bottleneck Features for Text-Dependent Speaker Verification Using
X-vectors [20.829997825439886]
We study x-vectors for text-dependent speaker verification (TD-SV)
We investigate the impact of the different bottleneck (BN) features on the performance of x-vectors.
Experiments are conducted on the RedDots 2016 challenge database.
arXiv Detail & Related papers (2020-05-15T07:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.