Related papers: Function Vectors in Large Language Models

Function Vectors in Large Language Models

URL: http://arxiv.org/abs/2310.15213v2
Date: Sun, 25 Feb 2024 18:32:18 GMT
Title: Function Vectors in Large Language Models
Authors: Eric Todd, Millicent L. Li, Arnab Sen Sharma, Aaron Mueller, Byron C. Wallace, David Bau
Abstract summary: We report the presence of a simple neural mechanism that represents an input-output function as a vector within autoregressive transformer language models (LMs) Using causal mediation analysis on a diverse range of in-context-learning (ICL) tasks, we find that a small number attention heads transport a compact representation of the demonstrated task, which we call a function vector (FV)
Score: 45.267194267587435
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We report the presence of a simple neural mechanism that represents an input-output function as a vector within autoregressive transformer language models (LMs). Using causal mediation analysis on a diverse range of in-context-learning (ICL) tasks, we find that a small number attention heads transport a compact representation of the demonstrated task, which we call a function vector (FV). FVs are robust to changes in context, i.e., they trigger execution of the task on inputs such as zero-shot and natural text settings that do not resemble the ICL contexts from which they are collected. We test FVs across a range of tasks, models, and layers and find strong causal effects across settings in middle layers. We investigate the internal structure of FVs and find while that they often contain information that encodes the output space of the function, this information alone is not sufficient to reconstruct an FV. Finally, we test semantic vector composition in FVs, and find that to some extent they can be summed to create vectors that trigger new complex tasks. Our findings show that compact, causal internal vector representations of function abstractions can be explicitly extracted from LLMs. Our code and data are available at https://functions.baulab.info.

Related papers

The Complexity of Learning Sparse Superposed Features with Feedback [0.9838799448847586]
We investigate whether the underlying learned features of a model can be efficiently retrieved through feedback from an agent. We analyze the feedback complexity associated with learning a feature matrix in sparse settings. Our results establish tight bounds when the agent is permitted to construct activations and demonstrate strong upper bounds in sparse scenarios.
arXiv Detail & Related papers (2025-02-08T01:54:23Z)
Vector-ICL: In-context Learning with Continuous Vector Representations [75.96920867382859]
Large language models (LLMs) have shown remarkable in-context learning capabilities on textual data. We explore whether these capabilities can be extended to continuous vectors from diverse domains, obtained from black-box pretrained encoders. In particular, we find that pretraining projectors with general language modeling objectives enables Vector-ICL.
arXiv Detail & Related papers (2024-10-08T02:25:38Z)
Interpreting Attention Layer Outputs with Sparse Autoencoders [3.201633659481912]
Decomposing model activations into interpretable components is a key open problem in mechanistic interpretability. In this work we train SAEs on attention layer outputs and show that also here SAEs find a sparse, interpretable decomposition. We show that Sparse Autoencoders are a useful tool that enable researchers to explain model behavior in greater detail than prior work.
arXiv Detail & Related papers (2024-06-25T17:43:13Z)
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models [32.2976613483151]
We analyze a mechanism used in two LMs to selectively inhibit items in a context in one task. We find that models write into low-rank subspaces of the residual stream to represent features which are then read out by later layers.
arXiv Detail & Related papers (2024-06-13T18:12:01Z)
FIND: A Function Description Benchmark for Evaluating Interpretability Methods [86.80718559904854]
This paper introduces FIND (Function INterpretation and Description), a benchmark suite for evaluating automated interpretability methods. FIND contains functions that resemble components of trained neural networks, and accompanying descriptions of the kind we seek to generate. We evaluate methods that use pretrained language models to produce descriptions of function behavior in natural language and code.
arXiv Detail & Related papers (2023-09-07T17:47:26Z)
Adapting Language Models to Compress Contexts [71.98287002918941]
Transformer-based language models (LMs) are powerful and widely-applicable tools, but their usefulness is constrained by a finite context window. We propose to adapt pre-trained LMs into AutoCompressors, which are capable of compressing long contexts into compact summary vectors. We fine-tune OPT and Llama-2 models on sequences of up to 30,720 tokens and show that AutoCompressors can utilize long contexts to improve perplexity.
arXiv Detail & Related papers (2023-05-24T06:42:44Z)
PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result. Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z)
Computing on Functions Using Randomized Vector Representations [4.066849397181077]
We call this new function encoding and computing framework Vector Function Architecture (VFA) Our analyses and results suggest that VFAs constitute a powerful new framework for representing and manipulating functions in distributed neural systems.
arXiv Detail & Related papers (2021-09-08T04:39:48Z)
How LSTM Encodes Syntax: Exploring Context Vectors and Semi-Quantization on Natural Text [2.881185491084005]
We learn a language model where syntactic structures are implicitly given. We show that the context update vectors, i.e. outputs of internal gates, are approximately quantized to binary or ternary values. For some dimensions in the context vector, we show that their activations are highly correlated with the depth of phrase structures. We also show that natural clusters of the functional words and the part of speeches that trigger phrases are represented in a small but principal subspace of the context-update vector of LSTM.
arXiv Detail & Related papers (2020-10-01T12:49:01Z)
iffDetector: Inference-aware Feature Filtering for Object Detection [70.8678270164057]
We introduce a generic Inference-aware Feature Filtering (IFF) module that can easily be combined with modern detectors. IFF performs closed-loop optimization by leveraging high-level semantics to enhance the convolutional features. IFF can be fused with CNN-based object detectors in a plug-and-play manner with negligible computational cost overhead.
arXiv Detail & Related papers (2020-06-23T02:57:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.