Related papers: Weight-based Analysis of Detokenization in Language Models: Understanding the First Stage of Inference Without Inference

Weight-based Analysis of Detokenization in Language Models: Understanding the First Stage of Inference Without Inference

URL: http://arxiv.org/abs/2501.15754v3
Date: Mon, 10 Feb 2025 06:26:57 GMT
Title: Weight-based Analysis of Detokenization in Language Models: Understanding the First Stage of Inference Without Inference
Authors: Go Kamoda, Benjamin Heinzerling, Tatsuro Inaba, Keito Kudo, Keisuke Sakaguchi, Kentaro Inui,
Abstract summary: We show that several important aspects of the detokenization stage can be understood purely by analyzing model weights.<n>Our decomposition yields interpretable terms that quantify the relative contributions of position-related, token-related, and mixed effects.
Score: 30.31106907785379
License: http://creativecommons.org/licenses/by/4.0/
Abstract: According to the stages-of-inference hypothesis, early layers of language models map their subword-tokenized input, which does not necessarily correspond to a linguistically meaningful segmentation, to more meaningful representations that form the model's "inner vocabulary". Prior analysis of this detokenization stage has predominantly relied on probing and interventions such as path patching, which involve selecting particular inputs, choosing a subset of components that will be patched, and then observing changes in model behavior. Here, we show that several important aspects of the detokenization stage can be understood purely by analyzing model weights, without performing any model inference steps. Specifically, we introduce an analytical decomposition of first-layer attention in GPT-2. Our decomposition yields interpretable terms that quantify the relative contributions of position-related, token-related, and mixed effects. By focusing on terms in this decomposition, we discover weight-based explanations of attention bias toward close tokens and attention for detokenization.

Related papers

On Support Samples of Next Word Prediction [14.854557537744405]
This paper investigates emphdata-centric interpretability in language models.<n>Using representer theorem, we identify two types of emphsupport samples--those that either promote or deter specific predictions.
arXiv Detail & Related papers (2025-06-04T15:13:22Z)
How Language Models Prioritize Contextual Grammatical Cues? [3.9790222241649587]
We investigate how language models handle gender agreement when multiple gender cue words are present. Our findings reveal striking differences in how encoder-based and decoder-based models prioritize and use contextual information for their predictions.
arXiv Detail & Related papers (2024-10-04T14:09:05Z)
The Foundations of Tokenization: Statistical and Computational Concerns [51.370165245628975]
Tokenization is a critical step in the NLP pipeline. Despite its recognized importance as a standard representation method in NLP, the theoretical underpinnings of tokenization are not yet fully understood. The present paper contributes to addressing this theoretical gap by proposing a unified formal framework for representing and analyzing tokenizer models.
arXiv Detail & Related papers (2024-07-16T11:12:28Z)
Distributional Associations vs In-Context Reasoning: A Study of Feed-forward and Attention Layers [49.80959223722325]
We study the distinction between feed-forward and attention layers in large language models. We find that feed-forward layers tend to learn simple distributional associations such as bigrams, while attention layers focus on in-context reasoning.
arXiv Detail & Related papers (2024-06-05T08:51:08Z)
Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model Predictions [9.909170013118775]
This work presents a linear decomposition of final hidden states from autoregressive language models based on each initial input token. Using the change in next-word probability as a measure of importance, this work first examines which context words make the biggest contribution to language model predictions.
arXiv Detail & Related papers (2023-05-17T23:55:32Z)
Estimating the Causal Effects of Natural Logic Features in Neural NLI Models [2.363388546004777]
We zone in on specific patterns of reasoning with enough structure and regularity to be able to identify and quantify systematic reasoning failures in widely-used models. We apply causal effect estimation strategies to measure the effect of context interventions. Following related work on causal analysis of NLP models in different settings, we adapt the methodology for the NLI task to construct comparative model profiles.
arXiv Detail & Related papers (2023-05-15T12:01:09Z)
Resolving label uncertainty with implicit posterior models [71.62113762278963]
We propose a method for jointly inferring labels across a collection of data samples. By implicitly assuming the existence of a generative model for which a differentiable predictor is the posterior, we derive a training objective that allows learning under weak beliefs.
arXiv Detail & Related papers (2022-02-28T18:09:44Z)
A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes. We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z)
Did the Cat Drink the Coffee? Challenging Transformers with Generalized Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit Our results show that TLMs can reach performances that are comparable to those achieved by SDM. However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z)
Probing for Bridging Inference in Transformer Language Models [15.216901057561428]
We first investigate individual attention heads in BERT and observe that attention heads at higher layers prominently focus on bridging relations. We consider language models as a whole in our approach where bridging anaphora resolution is formulated as a masked token prediction task. Our formulation produces optimistic results without any fine-tuning, which indicates that pre-trained language models substantially capture bridging inference.
arXiv Detail & Related papers (2021-04-19T15:42:24Z)
Understanding Neural Abstractive Summarization Models via Uncertainty [54.37665950633147]
seq2seq abstractive summarization models generate text in a free-form manner. We study the entropy, or uncertainty, of the model's token-level predictions. We show that uncertainty is a useful perspective for analyzing summarization and text generation models more broadly.
arXiv Detail & Related papers (2020-10-15T16:57:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.