Related papers: How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

URL: http://arxiv.org/abs/2502.11196v1
Date: Sun, 16 Feb 2025 16:55:43 GMT
Title: How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training
Authors: Yixin Ou, Yunzhi Yao, Ningyu Zhang, Hui Jin, Jiacheng Sun, Shumin Deng, Zhenguo Li, Huajun Chen,
Abstract summary: Large Language Models (LLMs) face a critical gap in understanding how they internalize new knowledge.<n>We identify computational subgraphs that facilitate knowledge storage and processing.
Score: 92.88889953768455
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite exceptional capabilities in knowledge-intensive tasks, Large Language Models (LLMs) face a critical gap in understanding how they internalize new knowledge, particularly how to structurally embed acquired knowledge in their neural computations. We address this issue through the lens of knowledge circuit evolution, identifying computational subgraphs that facilitate knowledge storage and processing. Our systematic analysis of circuit evolution throughout continual pre-training reveals several key findings: (1) the acquisition of new knowledge is influenced by its relevance to pre-existing knowledge; (2) the evolution of knowledge circuits exhibits a distinct phase shift from formation to optimization; (3) the evolution of knowledge circuits follows a deep-to-shallow pattern. These insights not only advance our theoretical understanding of the mechanisms of new knowledge acquisition in LLMs, but also provide potential implications for improving continual pre-training strategies to enhance model performance. Code and data will be available at https://github.com/zjunlp/DynamicKnowledgeCircuits.

Related papers

Unveiling Knowledge Utilization Mechanisms in LLM-based Retrieval-Augmented Generation [77.10390725623125]
retrieval-augmented generation (RAG) is widely employed to expand their knowledge scope.<n>Since RAG has shown promise in knowledge-intensive tasks like open-domain question answering, its broader application to complex tasks and intelligent assistants has further advanced its utility.<n>We present a systematic investigation of the intrinsic mechanisms by which RAGs integrate internal (parametric) and external (retrieved) knowledge.
arXiv Detail & Related papers (2025-05-17T13:13:13Z)
Towards Understanding How Knowledge Evolves in Large Vision-Language Models [55.82918299608732]
We investigate how multimodal knowledge evolves and eventually induces natural languages in Large Vision-Language Models (LVLMs) We identify two key nodes in knowledge evolution: the critical layers and the mutation layers, dividing the evolution process into three stages: rapid evolution, stabilization, and mutation. Our research is the first to reveal the trajectory of knowledge evolution in LVLMs, providing a fresh perspective for understanding their underlying mechanisms.
arXiv Detail & Related papers (2025-03-31T17:35:37Z)
Effective LLM Knowledge Learning via Model Generalization [73.16975077770765]
Large language models (LLMs) are trained on enormous documents that contain extensive world knowledge. It is still not well-understood how knowledge is acquired via autoregressive pre-training. In this paper, we focus on understanding and improving LLM knowledge learning.
arXiv Detail & Related papers (2025-03-05T17:56:20Z)
Composite Learning Units: Generalized Learning Beyond Parameter Updates to Transform LLMs into Adaptive Reasoners [0.0]
We introduce Composite Learning Units (CLUs) designed to transform reasoners into learners capable of continuous learning. CLUs are built on an architecture that allows a reasoning model to maintain and evolve a dynamic knowledge repository. We demonstrate CLUs' effectiveness through a cryptographic reasoning task, where they continuously evolve their understanding through feedback to uncover hidden transformation rules.
arXiv Detail & Related papers (2024-10-09T02:27:58Z)
Knowledge Mechanisms in Large Language Models: A Survey and Perspective [88.51320482620679]
This paper reviews knowledge mechanism analysis from a novel taxonomy including knowledge utilization and evolution.<n>We discuss what knowledge LLMs have learned, the reasons for the fragility of parametric knowledge, and the potential dark knowledge (hypothesis) that will be challenging to address.
arXiv Detail & Related papers (2024-07-22T06:15:59Z)
InfuserKI: Enhancing Large Language Models with Knowledge Graphs via Infuser-Guided Knowledge Integration [58.61492157691623]
Methods for integrating knowledge have been developed, which augment LLMs with domain-specific knowledge graphs through external modules. Our research focuses on a novel problem: efficiently integrating unknown knowledge into LLMs without unnecessary overlap of known knowledge. A risk of introducing new knowledge is the potential forgetting of existing knowledge.
arXiv Detail & Related papers (2024-02-18T03:36:26Z)
A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches. We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z)
DKPLM: Decomposable Knowledge-enhanced Pre-trained Language Model for Natural Language Understanding [19.478288026844893]
Knowledge-Enhanced Pre-trained Language Models (KEPLMs) are pre-trained models with relation triples injecting from knowledge graphs to improve language understanding abilities. Previous studies integrate models with knowledge encoders for representing knowledge retrieved from knowledge graphs. We propose a novel KEPLM named DKPLM that Decomposes Knowledge injection process of the Pre-trained Language Models in pre-training, fine-tuning and inference stages.
arXiv Detail & Related papers (2021-12-02T08:19:42Z)
Towards Continual Knowledge Learning of Language Models [11.000501711652829]
Large Language Models (LMs) are known to encode world knowledge in their parameters as they pretrain on a vast amount of web corpus. In real-world scenarios, the world knowledge stored in the LMs can quickly become outdated as the world changes. We formulate a new continual learning (CL) problem called Continual Knowledge Learning (CKL)
arXiv Detail & Related papers (2021-10-07T07:00:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.