Related papers: Computation Mechanism Behind LLM Position Generalization

Computation Mechanism Behind LLM Position Generalization

URL: http://arxiv.org/abs/2503.13305v1
Date: Mon, 17 Mar 2025 15:47:37 GMT
Title: Computation Mechanism Behind LLM Position Generalization
Authors: Chi Han, Heng Ji,
Abstract summary: Large language models (LLMs) exhibit flexibility in handling textual positions.<n>They can understand texts with position perturbations and generalize to longer texts.<n>This work connects the linguistic phenomenon with LLMs' computational mechanisms.
Score: 59.013857707250814
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Most written natural languages are composed of sequences of words and sentences. Similar to humans, large language models (LLMs) exhibit flexibility in handling textual positions - a phenomenon we term position generalization. They can understand texts with position perturbations and generalize to longer texts than those encountered during training with the latest techniques. These phenomena suggest that LLMs handle positions tolerantly, but how LLMs computationally process positional relevance remains largely unexplored. This work connects the linguistic phenomenon with LLMs' computational mechanisms. We show how LLMs enforce certain computational mechanisms for the aforementioned tolerance in position perturbations. Despite the complex design of the self-attention mechanism, this work reveals that LLMs learn a counterintuitive disentanglement of attention logits. Their values show a 0.959 linear correlation with an approximation of the arithmetic sum of positional relevance and semantic importance. Furthermore, we identify a prevalent pattern in intermediate features, which we prove theoretically enables this effect. The pattern, which is different from how randomly initialized parameters would behave, suggests that it is a learned behavior rather than a natural result of the model architecture. Based on these findings, we provide computational explanations and criteria for LLMs' position flexibilities. This work takes a pioneering step in linking position generalization with modern LLMs' internal mechanisms.

Related papers

LLMs Explain't: A Post-Mortem on Semantic Interpretability in Transformer Models [3.7965260744113163]
Large Language Models (LLMs) are becoming increasingly popular in pervasive computing due to their versatility and strong performance.<n>This paper investigates how linguistic abstraction emerges in LLMs, aiming to detect it across different modules.<n> Attention-based explanations collapsed once we tested the core assumption that later-layer representations still correspond to tokens.<n>Property-inference methods applied to embeddings also failed because their high predictive scores were driven by methodological artifacts and dataset structure.
arXiv Detail & Related papers (2026-01-30T12:46:37Z)
Framework for Machine Evaluation of Reasoning Completeness in Large Language Models For Classification Tasks [0.0]
This paper introduces RACE-Reasoning Alignment for Completeness of Explanations.<n>We analyze four widely used text classification datasets-WIKI ONTOLOGY, AG NEWS, IMDB, and GOEMOTIONS.<n>We show that correct predictions exhibit higher coverage of supporting features, while incorrect predictions are associated with elevated coverage of contradicting features.
arXiv Detail & Related papers (2025-10-23T20:22:22Z)
When can isotropy help adapt LLMs' next word prediction to numerical domains? [53.98633183204453]
It is shown that the isotropic property of LLM embeddings in contextual embedding space preserves the underlying structure of representations.<n> Experiments show that different characteristics of numerical data and model architectures have different impacts on isotropy.
arXiv Detail & Related papers (2025-05-22T05:10:34Z)
A Modular Dataset to Demonstrate LLM Abstraction Capability [3.0899016152680754]
Large language models (LLMs) exhibit impressive capabilities but struggle with reasoning errors due to hallucinations and flawed logic. We introduce ArrangementPuzzle, a novel puzzle dataset with structured solutions and automated stepwise correctness verification. We trained a classifier model on LLM activations on this dataset and found that it achieved over 80% accuracy in predicting reasoning correctness.
arXiv Detail & Related papers (2025-03-22T04:25:30Z)
Randomly Sampled Language Reasoning Problems Elucidate Limitations of In-Context Learning [9.75748930802634]
We study the power of in-context-learning to improve machine learning performance.<n>We consider an extremely simple domain: next token prediction on simple language tasks.<n>We find that LLMs uniformly underperform n-gram models on this task.
arXiv Detail & Related papers (2025-01-06T07:57:51Z)
Language Agents Meet Causality -- Bridging LLMs and Causal World Models [50.79984529172807]
We propose a framework that integrates causal representation learning with large language models. This framework learns a causal world model, with causal variables linked to natural language expressions. We evaluate the framework on causal inference and planning tasks across temporal scales and environmental complexities.
arXiv Detail & Related papers (2024-10-25T18:36:37Z)
Large Language Models and the Extended Church-Turing Thesis [0.0]
We investigate the computational power of large language models (LLMs) by the classical means of computability and computational complexity theory. We show that any fixed (non-adaptive) LLM is computationally equivalent to a, possibly very large, deterministic finite-state transducer. We discuss the merits of our findings in the broader context of several related disciplines and philosophies.
arXiv Detail & Related papers (2024-09-11T03:09:55Z)
Misinforming LLMs: vulnerabilities, challenges and opportunities [4.54019093815234]
Large Language Models (LLMs) have made significant advances in natural language processing, but their underlying mechanisms are often misunderstood. This paper argues that current LLM architectures are inherently untrustworthy due to their reliance on correlations of sequential patterns of word embedding vectors. Research into combining generative transformer-based models with fact bases and logic programming languages may lead to the development of trustworthy LLMs.
arXiv Detail & Related papers (2024-08-02T10:35:49Z)
Do LLMs Really Adapt to Domains? An Ontology Learning Perspective [2.0755366440393743]
Large Language Models (LLMs) have demonstrated unprecedented prowess across various natural language processing tasks in various application domains. Recent studies show that LLMs can be leveraged to perform lexical semantic tasks, such as Knowledge Base Completion (KBC) or Ontology Learning (OL) This paper investigates the question: Do LLMs really adapt to domains and remain consistent in the extraction of structured knowledge, or do they only learn lexical senses instead of reasoning?
arXiv Detail & Related papers (2024-07-29T13:29:43Z)
Potential and Limitations of LLMs in Capturing Structured Semantics: A Case Study on SRL [78.80673954827773]
Large Language Models (LLMs) play a crucial role in capturing structured semantics to enhance language understanding, improve interpretability, and reduce bias. We propose using Semantic Role Labeling (SRL) as a fundamental task to explore LLMs' ability to extract structured semantics. We find interesting potential: LLMs can indeed capture semantic structures, and scaling-up doesn't always mirror potential. We are surprised to discover that significant overlap in the errors is made by both LLMs and untrained humans, accounting for almost 30% of all errors.
arXiv Detail & Related papers (2024-05-10T11:44:05Z)
The Strong Pull of Prior Knowledge in Large Language Models and Its Impact on Emotion Recognition [74.04775677110179]
In-context Learning (ICL) has emerged as a powerful paradigm for performing natural language tasks with Large Language Models (LLM) We show that LLMs have strong yet inconsistent priors in emotion recognition that ossify their predictions. Our results suggest that caution is needed when using ICL with larger LLMs for affect-centered tasks outside their pre-training domain.
arXiv Detail & Related papers (2024-03-25T19:07:32Z)
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs) We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z)
How Abstract Is Linguistic Generalization in Large Language Models? Experiments with Argument Structure [2.530495315660486]
We investigate the degree to which pre-trained Transformer-based large language models represent relationships between contexts. We find that LLMs perform well in generalizing the distribution of a novel noun argument between related contexts. However, LLMs fail at generalizations between related contexts that have not been observed during pre-training.
arXiv Detail & Related papers (2023-11-08T18:58:43Z)
Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning. In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training. We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.