Incremental Sentence Processing Mechanisms in Autoregressive Transformer Language Models
- URL: http://arxiv.org/abs/2412.05353v1
- Date: Fri, 06 Dec 2024 18:54:54 GMT
- Title: Incremental Sentence Processing Mechanisms in Autoregressive Transformer Language Models
- Authors: Michael Hanna, Aaron Mueller,
- Abstract summary: We study the mechanisms underlying garden path sentence processing in LMs.<n>We find that while many important features relate to syntactic structure, some reflect syntactically irrelevants.<n>While most active features correspond to one reading of the sentence, some features correspond to the other, suggesting that LMs assign weight to both possibilities simultaneously.
- Score: 12.866627382118768
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autoregressive transformer language models (LMs) possess strong syntactic abilities, often successfully handling phenomena from agreement to NPI licensing. However, the features they use to incrementally process language inputs are not well understood. In this paper, we fill this gap by studying the mechanisms underlying garden path sentence processing in LMs. We ask: (1) Do LMs use syntactic features or shallow heuristics to perform incremental sentence processing? (2) Do LMs represent only one potential interpretation, or multiple? and (3) Do LMs reanalyze or repair their initial incorrect representations? To address these questions, we use sparse autoencoders to identify interpretable features that determine which continuation - and thus which reading - of a garden path sentence the LM prefers. We find that while many important features relate to syntactic structure, some reflect syntactically irrelevant heuristics. Moreover, while most active features correspond to one reading of the sentence, some features correspond to the other, suggesting that LMs assign weight to both possibilities simultaneously. Finally, LMs do not re-use features from garden path sentence processing to answer follow-up questions.
Related papers
- (How) Do Language Models Track State? [50.516691979518164]
Transformer language models (LMs) exhibit behaviors that appear to require tracking the unobserved state of an evolving world.
We study state tracking in LMs trained or fine-tuned to compose permutations.
We show that LMs consistently learn one of two state tracking mechanisms for this task.
arXiv Detail & Related papers (2025-03-04T18:31:02Z) - Evil twins are not that evil: Qualitative insights into machine-generated prompts [11.42957674201616]
We present the first thorough analysis of opaque machine-generated prompts, or autoprompts.
We find that machine-generated prompts are characterized by a last token that is often intelligible and strongly affects the generation.
Human experts can reliably identify the most influential tokens in an autoprompt a posteriori, suggesting these prompts are not entirely opaque.
arXiv Detail & Related papers (2024-12-11T06:22:44Z) - Incremental Comprehension of Garden-Path Sentences by Large Language Models: Semantic Interpretation, Syntactic Re-Analysis, and Attention [11.073959609358088]
We investigate the processing of garden-path sentences and the fate of lingering misinterpretations using four large language models.
The overall goal is to evaluate whether humans and LLMs are aligned in their processing of garden-path sentences.
Experiments show promising alignment between humans and LLMs in the processing of garden-path sentences.
arXiv Detail & Related papers (2024-05-25T03:36:13Z) - Transformers Can Represent $n$-gram Language Models [56.06361029539347]
We focus on the relationship between transformer LMs and $n$-gram LMs, a simple and historically relevant class of language models.
We show that transformer LMs using the hard or sparse attention mechanisms can exactly represent any $n$-gram LM.
arXiv Detail & Related papers (2024-04-23T12:51:37Z) - DSPy Assertions: Computational Constraints for Self-Refining Language
Model Pipelines [41.779902953557425]
Chaining language model (LM) calls as composable modules is fueling a new way of programming.
We introduce LM Assertions, a construct for expressing computational constraints that LMs should satisfy.
We present new strategies that allow DSPy to compile programs with LM Assertions into more reliable and accurate systems.
arXiv Detail & Related papers (2023-12-20T19:13:26Z) - Frugal LMs Trained to Invoke Symbolic Solvers Achieve
Parameter-Efficient Arithmetic Reasoning [36.8749786658624]
Large Language Models (LLM) exhibit zero-shot mathematical reasoning capacity as a behavior emergent with scale.
We show that small LMs can achieve reasonable arithmetic reasoning if arithmetic word problems are posed as a formalize-then-solve task.
arXiv Detail & Related papers (2023-12-09T13:20:49Z) - Towards a Mechanistic Interpretation of Multi-Step Reasoning
Capabilities of Language Models [107.07851578154242]
Language models (LMs) have strong multi-step (i.e., procedural) reasoning capabilities.
It is unclear whether LMs perform tasks by cheating with answers memorized from pretraining corpus, or, via a multi-step reasoning mechanism.
We show that MechanisticProbe is able to detect the information of the reasoning tree from the model's attentions for most examples.
arXiv Detail & Related papers (2023-10-23T01:47:29Z) - Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks [98.5311231450689]
In-context learning (ICL) has played an essential role in utilizing large language models (LLMs)
This study is the first work exploring ICL for speech classification tasks with textless speech LM.
arXiv Detail & Related papers (2023-10-19T05:31:45Z) - Augmented Language Models: a Survey [55.965967655575454]
This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools.
We refer to them as Augmented Language Models (ALMs)
The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks.
arXiv Detail & Related papers (2023-02-15T18:25:52Z) - Prompting as Probing: Using Language Models for Knowledge Base
Construction [1.6050172226234583]
We present ProP (Prompting as Probing), which utilizes GPT-3, a large Language Model originally proposed by OpenAI in 2020.
ProP implements a multi-step approach that combines a variety of prompting techniques to achieve this.
Our evaluation study indicates that these proposed techniques can substantially enhance the quality of the final predictions.
arXiv Detail & Related papers (2022-08-23T16:03:50Z) - Language Model Prior for Low-Resource Neural Machine Translation [85.55729693003829]
We propose a novel approach to incorporate a LM as prior in a neural translation model (TM)
We add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior.
Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.
arXiv Detail & Related papers (2020-04-30T16:29:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.