How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training
- URL: http://arxiv.org/abs/2502.11196v1
- Date: Sun, 16 Feb 2025 16:55:43 GMT
- Title: How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training
- Authors: Yixin Ou, Yunzhi Yao, Ningyu Zhang, Hui Jin, Jiacheng Sun, Shumin Deng, Zhenguo Li, Huajun Chen,
- Abstract summary: Large Language Models (LLMs) face a critical gap in understanding how they internalize new knowledge.
We identify computational subgraphs that facilitate knowledge storage and processing.
- Score: 92.88889953768455
- License:
- Abstract: Despite exceptional capabilities in knowledge-intensive tasks, Large Language Models (LLMs) face a critical gap in understanding how they internalize new knowledge, particularly how to structurally embed acquired knowledge in their neural computations. We address this issue through the lens of knowledge circuit evolution, identifying computational subgraphs that facilitate knowledge storage and processing. Our systematic analysis of circuit evolution throughout continual pre-training reveals several key findings: (1) the acquisition of new knowledge is influenced by its relevance to pre-existing knowledge; (2) the evolution of knowledge circuits exhibits a distinct phase shift from formation to optimization; (3) the evolution of knowledge circuits follows a deep-to-shallow pattern. These insights not only advance our theoretical understanding of the mechanisms of new knowledge acquisition in LLMs, but also provide potential implications for improving continual pre-training strategies to enhance model performance. Code and data will be available at https://github.com/zjunlp/DynamicKnowledgeCircuits.
Related papers
- Composite Learning Units: Generalized Learning Beyond Parameter Updates to Transform LLMs into Adaptive Reasoners [0.0]
We introduce Composite Learning Units (CLUs) designed to transform reasoners into learners capable of continuous learning.
CLUs are built on an architecture that allows a reasoning model to maintain and evolve a dynamic knowledge repository.
We demonstrate CLUs' effectiveness through a cryptographic reasoning task, where they continuously evolve their understanding through feedback to uncover hidden transformation rules.
arXiv Detail & Related papers (2024-10-09T02:27:58Z) - Knowledge Mechanisms in Large Language Models: A Survey and Perspective [88.51320482620679]
This paper reviews knowledge mechanism analysis from a novel taxonomy including knowledge utilization and evolution.
We discuss what knowledge LLMs have learned, the reasons for the fragility of parametric knowledge, and the potential dark knowledge (hypothesis) that will be challenging to address.
arXiv Detail & Related papers (2024-07-22T06:15:59Z) - InfuserKI: Enhancing Large Language Models with Knowledge Graphs via Infuser-Guided Knowledge Integration [58.61492157691623]
Methods for integrating knowledge have been developed, which augment LLMs with domain-specific knowledge graphs through external modules.
Our research focuses on a novel problem: efficiently integrating unknown knowledge into LLMs without unnecessary overlap of known knowledge.
A risk of introducing new knowledge is the potential forgetting of existing knowledge.
arXiv Detail & Related papers (2024-02-18T03:36:26Z) - A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication.
This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches.
We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z) - Anti-Retroactive Interference for Lifelong Learning [65.50683752919089]
We design a paradigm for lifelong learning based on meta-learning and associative mechanism of the brain.
It tackles the problem from two aspects: extracting knowledge and memorizing knowledge.
It is theoretically analyzed that the proposed learning paradigm can make the models of different tasks converge to the same optimum.
arXiv Detail & Related papers (2022-08-27T09:27:36Z) - DKPLM: Decomposable Knowledge-enhanced Pre-trained Language Model for
Natural Language Understanding [19.478288026844893]
Knowledge-Enhanced Pre-trained Language Models (KEPLMs) are pre-trained models with relation triples injecting from knowledge graphs to improve language understanding abilities.
Previous studies integrate models with knowledge encoders for representing knowledge retrieved from knowledge graphs.
We propose a novel KEPLM named DKPLM that Decomposes Knowledge injection process of the Pre-trained Language Models in pre-training, fine-tuning and inference stages.
arXiv Detail & Related papers (2021-12-02T08:19:42Z) - Towards Continual Knowledge Learning of Language Models [11.000501711652829]
Large Language Models (LMs) are known to encode world knowledge in their parameters as they pretrain on a vast amount of web corpus.
In real-world scenarios, the world knowledge stored in the LMs can quickly become outdated as the world changes.
We formulate a new continual learning (CL) problem called Continual Knowledge Learning (CKL)
arXiv Detail & Related papers (2021-10-07T07:00:57Z) - Towards a Universal Continuous Knowledge Base [49.95342223987143]
We propose a method for building a continuous knowledge base that can store knowledge imported from multiple neural networks.
Experiments on text classification show promising results.
We import the knowledge from multiple models to the knowledge base, from which the fused knowledge is exported back to a single model.
arXiv Detail & Related papers (2020-12-25T12:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.