Related papers: Time Course MechInterp: Analyzing the Evolution of Components and Knowledge in Large Language Models

Time Course MechInterp: Analyzing the Evolution of Components and Knowledge in Large Language Models

URL: http://arxiv.org/abs/2506.03434v1
Date: Tue, 03 Jun 2025 22:35:09 GMT
Title: Time Course MechInterp: Analyzing the Evolution of Components and Knowledge in Large Language Models
Authors: Ahmad Dawar Hakimi, Ali Modarressi, Philipp Wicke, Hinrich Schütze,
Abstract summary: We analyze the evolution of factual knowledge representation in the OLMo-7B model.<n>Our results show that LLMs initially depend on broad, general-purpose components, which later specialize as training progresses.
Score: 47.82491185709275
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding how large language models (LLMs) acquire and store factual knowledge is crucial for enhancing their interpretability and reliability. In this work, we analyze the evolution of factual knowledge representation in the OLMo-7B model by tracking the roles of its attention heads and feed forward networks (FFNs) over the course of pre-training. We classify these components into four roles: general, entity, relation-answer, and fact-answer specific, and examine their stability and transitions. Our results show that LLMs initially depend on broad, general-purpose components, which later specialize as training progresses. Once the model reliably predicts answers, some components are repurposed, suggesting an adaptive learning process. Notably, attention heads display the highest turnover. We also present evidence that FFNs remain more stable throughout training. Furthermore, our probing experiments reveal that location-based relations converge to high accuracy earlier in training than name-based relations, highlighting how task complexity shapes acquisition dynamics. These insights offer a mechanistic view of knowledge formation in LLMs.

Related papers

How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective [64.00022624183781]
Large language models (LLMs) can assess relevance and support information retrieval (IR) tasks.<n>We investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability.
arXiv Detail & Related papers (2025-04-10T16:14:55Z)
Enhancing Multi-Hop Fact Verification with Structured Knowledge-Augmented Large Language Models [26.023148371263012]
We propose a novel Structured Knowledge-Augmented LLM-based Network (LLM-SKAN) for multi-hop fact verification.<n>Specifically, we utilize an LLM-driven Knowledge Extractor to capture fine-grained information, including entities and their complicated relations.<n>The experimental results on four common-used datasets demonstrate the effectiveness and superiority of our model.
arXiv Detail & Related papers (2025-03-11T14:47:24Z)
LLM Post-Training: A Deep Dive into Reasoning Large Language Models [131.10969986056]
Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications.<n>Post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations.
arXiv Detail & Related papers (2025-02-28T18:59:54Z)
Learning Beyond the Surface: How Far Can Continual Pre-Training with LoRA Enhance LLMs' Domain-Specific Insight Learning? [4.390998479503661]
Large Language Models (LLMs) have demonstrated remarkable performance on various tasks.<n>However, their ability to extract and internalize deeper insights from domain-specific datasets remains underexplored.<n>This study investigates how continual pre-training can enhance LLMs' capacity for insight learning.
arXiv Detail & Related papers (2025-01-29T18:40:32Z)
Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data [76.90128359866462]
We introduce an extended concept of memorization, distributional memorization, which measures the correlation between the output probabilities and the pretraining data frequency.<n>This study demonstrates that memorization plays a larger role in simpler, knowledge-intensive tasks, while generalization is the key for harder, reasoning-based tasks.
arXiv Detail & Related papers (2024-07-20T21:24:40Z)
How Do Large Language Models Acquire Factual Knowledge During Pretraining? [36.59608982935844]
We study how large language models (LLMs) acquire factual knowledge during pretraining. Findings reveal several important insights into the dynamics of factual knowledge acquisition during pretraining.
arXiv Detail & Related papers (2024-06-17T17:54:40Z)
Large Language Models are Limited in Out-of-Context Knowledge Reasoning [65.72847298578071]
Large Language Models (LLMs) possess extensive knowledge and strong capabilities in performing in-context reasoning. This paper focuses on a significant aspect of out-of-context reasoning: Out-of-Context Knowledge Reasoning (OCKR), which is to combine multiple knowledge to infer new knowledge.
arXiv Detail & Related papers (2024-06-11T15:58:59Z)
TRELM: Towards Robust and Efficient Pre-training for Knowledge-Enhanced Language Models [31.209774088374374]
This paper introduces TRELM, a Robust and Efficient Pre-training framework for Knowledge-Enhanced Language Models. We employ a robust approach to inject knowledge triples and employ a knowledge-augmented memory bank to capture valuable information. We show that TRELM reduces pre-training time by at least 50% and outperforms other KEPLMs in knowledge probing tasks and multiple knowledge-aware language understanding tasks.
arXiv Detail & Related papers (2024-03-17T13:04:35Z)
EpiK-Eval: Evaluation for Language Models as Epistemic Models [16.485951373967502]
We introduce EpiK-Eval, a novel question-answering benchmark tailored to evaluate LLMs' proficiency in formulating a coherent and consistent knowledge representation from segmented narratives. We argue that these shortcomings stem from the intrinsic nature of prevailing training objectives. The findings from this study offer insights for developing more robust and reliable LLMs.
arXiv Detail & Related papers (2023-10-23T21:15:54Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.