Knowledge Editing for Large Language Models: A Survey
- URL: http://arxiv.org/abs/2310.16218v3
- Date: Thu, 14 Dec 2023 21:49:59 GMT
- Title: Knowledge Editing for Large Language Models: A Survey
- Authors: Song Wang, Yaochen Zhu, Haochen Liu, Zaiyi Zheng, Chen Chen, Jundong
Li
- Abstract summary: One major drawback of large language models (LLMs) is their substantial computational cost for pre-training.
Knowledge-based Model Editing (KME) has attracted increasing attention, which aims to precisely modify the LLMs to incorporate specific knowledge.
- Score: 51.01368551235289
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have recently transformed both the academic and
industrial landscapes due to their remarkable capacity to understand, analyze,
and generate texts based on their vast knowledge and reasoning ability.
Nevertheless, one major drawback of LLMs is their substantial computational
cost for pre-training due to their unprecedented amounts of parameters. The
disadvantage is exacerbated when new knowledge frequently needs to be
introduced into the pre-trained model. Therefore, it is imperative to develop
effective and efficient techniques to update pre-trained LLMs. Traditional
methods encode new knowledge in pre-trained LLMs through direct fine-tuning.
However, naively re-training LLMs can be computationally intensive and risks
degenerating valuable pre-trained knowledge irrelevant to the update in the
model. Recently, Knowledge-based Model Editing (KME) has attracted increasing
attention, which aims to precisely modify the LLMs to incorporate specific
knowledge, without negatively influencing other irrelevant knowledge. In this
survey, we aim to provide a comprehensive and in-depth overview of recent
advances in the field of KME. We first introduce a general formulation of KME
to encompass different KME strategies. Afterward, we provide an innovative
taxonomy of KME techniques based on how the new knowledge is introduced into
pre-trained LLMs, and investigate existing KME strategies while analyzing key
insights, advantages, and limitations of methods from each category. Moreover,
representative metrics, datasets, and applications of KME are introduced
accordingly. Finally, we provide an in-depth analysis regarding the
practicality and remaining challenges of KME and suggest promising research
directions for further advancement in this field.
Related papers
- Adapter-based Approaches to Knowledge-enhanced Language Models -- A Survey [48.52320309766703]
Knowledge-enhanced language models (KELMs) have emerged as promising tools to bridge the gap between large-scale language models and domain-specific knowledge.
KELMs can achieve higher factual accuracy and hallucinations by leveraging knowledge graphs (KGs)
arXiv Detail & Related papers (2024-11-25T14:10:24Z) - Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning [15.475427498268393]
The Train-Attention-Augmented Language Model (TAALM) enhances learning efficiency by dynamically predicting and applying weights to tokens based on their usefulness.
We show that TAALM proves the state-of-the-art performance upon the baselines, and also shows synergistic compatibility when integrated with previous CKL approaches.
arXiv Detail & Related papers (2024-07-24T01:04:34Z) - Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning [13.371405067535814]
This paper investigates the effectiveness ofSupervised Fine-Tuning (SFT) as a method for knowledge injection in Large Language Models (LLMs)
We compare different dataset generation strategies -- token-based and fact-based scaling -- to create training data that helps the model learn new information.
Our results show considerable performance improvements in Q&A tasks related to out-of-domain knowledge.
arXiv Detail & Related papers (2024-03-30T01:56:07Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - A Closer Look at the Limitations of Instruction Tuning [52.587607091917214]
We show that Instruction Tuning (IT) fails to enhance knowledge or skills in large language models (LLMs)
We also show that popular methods to improve IT do not lead to performance improvements over a simple LoRA fine-tuned model.
Our findings reveal that responses generated solely from pre-trained knowledge consistently outperform responses by models that learn any form of new knowledge from IT on open-source datasets.
arXiv Detail & Related papers (2024-02-03T04:45:25Z) - Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale.
This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z) - A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication.
This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches.
We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z) - Knowledge-Augmented Reasoning Distillation for Small Language Models in
Knowledge-Intensive Tasks [90.11273439036455]
Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks.
We propose Knowledge-Augmented Reasoning Distillation (KARD), a novel method that fine-tunes small LMs to generate rationales from LLMs with augmented knowledge retrieved from an external knowledge base.
We empirically show that KARD significantly improves the performance of small T5 and GPT models on the challenging knowledge-intensive reasoning datasets.
arXiv Detail & Related papers (2023-05-28T13:00:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.