Massive Editing for Large Language Models via Meta Learning
- URL: http://arxiv.org/abs/2311.04661v3
- Date: Thu, 25 Jan 2024 03:50:57 GMT
- Title: Massive Editing for Large Language Models via Meta Learning
- Authors: Chenmien Tan and Ge Zhang and Jie Fu
- Abstract summary: Large language models (LLMs) have enabled learning knowledge from the pre-training corpora, but the acquired knowledge may be fundamentally incorrect or outdated over time.
We propose the MAssive Language Model Editing Network (MALMEN), which formulates the parameter shift aggregation as the least square problem.
Our method is evaluated by editing up to thousands of facts on LMs with different architectures, i.e., BERT-base, GPT-2, T5-XL (2.8B), and GPT-J (6B)
- Score: 27.972194696587813
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While large language models (LLMs) have enabled learning knowledge from the
pre-training corpora, the acquired knowledge may be fundamentally incorrect or
outdated over time, which necessitates rectifying the knowledge of the language
model (LM) after the training. A promising approach involves employing a
hyper-network to generate parameter shift, whereas existing hyper-networks
suffer from inferior scalability in synchronous editing operation amount. To
mitigate the problem, we propose the MAssive Language Model Editing Network
(MALMEN), which formulates the parameter shift aggregation as the least square
problem, subsequently updating the LM parameters using the normal equation. To
accommodate editing multiple facts simultaneously with limited memory budgets,
we separate the computation on the hyper-network and LM, enabling arbitrary
batch size on both neural networks. Our method is evaluated by editing up to
thousands of facts on LMs with different architectures, i.e., BERT-base, GPT-2,
T5-XL (2.8B), and GPT-J (6B), across various knowledge-intensive NLP tasks,
i.e., closed book fact-checking and question answering. Remarkably, MALMEN is
capable of editing hundreds of times more facts than strong baselines with the
identical hyper-network architecture and outperforms editor specifically
designed for GPT. Our code is available at
https://github.com/ChenmienTan/malmen.
Related papers
- Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - Enhance Lifelong Model Editing with Continuous Data-Adapter Association [55.697627106315004]
Large language models (LLMs) require model editing to efficiently update specific knowledge within them and avoid factual errors.
Current approaches manage sequential edits by freezing original parameters and allocating new adapters for each knowledge modification.
We propose ELDER, textbfEnhancing textbfLifelong motextbfDel textbfEditing with mixtutextbfRe of Low-Rank Adapter (LoRA)
arXiv Detail & Related papers (2024-08-19T02:27:00Z) - DAFNet: Dynamic Auxiliary Fusion for Sequential Model Editing in Large Language Models [32.598670876662375]
A Dynamic Auxiliary Fusion Network (DAFNet) is designed to enhance the semantic interaction among the factual knowledge within the entire sequence.
DAFNet significantly outperforms strong baselines in single-turn and sequential editing.
arXiv Detail & Related papers (2024-05-31T02:56:49Z) - Robust and Scalable Model Editing for Large Language Models [75.95623066605259]
We propose EREN (Edit models by REading Notes) to improve the scalability and robustness of LLM editing.
Unlike existing techniques, it can integrate knowledge from multiple edits, and correctly respond to syntactically similar but semantically unrelated inputs.
arXiv Detail & Related papers (2024-03-26T06:57:23Z) - Online Adaptation of Language Models with a Memory of Amortized Contexts [82.02369596879817]
Memory of Amortized Contexts (MAC) is an efficient and effective online adaptation framework for large language models.
We show how MAC can be combined with and improve the performance of popular alternatives such as retrieval augmented generations.
arXiv Detail & Related papers (2024-03-07T08:34:57Z) - SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering [17.20346072074533]
Recent model editing is a promising technique for efficiently updating a small amount of knowledge of large language models (LLMs)
We propose a detachable and expandable Subject Word Embedding Altering (SWEA) framework, which finds the editing embeddings through token-level matching.
We demonstrate the overall state-of-the-art (SOTA) performance of SWEA$oplus$OS on the textscCounterFact and zsRE datasets.
arXiv Detail & Related papers (2024-01-31T13:08:45Z) - ReasoningLM: Enabling Structural Subgraph Reasoning in Pre-trained
Language Models for Question Answering over Knowledge Graph [142.42275983201978]
We propose a subgraph-aware self-attention mechanism to imitate the GNN for performing structured reasoning.
We also adopt an adaptation tuning strategy to adapt the model parameters with 20,000 subgraphs with synthesized questions.
Experiments show that ReasoningLM surpasses state-of-the-art models by a large margin, even with fewer updated parameters and less training data.
arXiv Detail & Related papers (2023-12-30T07:18:54Z) - G-SPEED: General SParse Efficient Editing MoDel [25.48360227520061]
underlinetextbfGeneral underlinetextbfSParse underlinetextbfEfficient underlinetextbfEditing MounderlinetextbfDel(textbfG-SPEED)
arXiv Detail & Related papers (2023-10-16T15:01:18Z) - Editing Factual Knowledge in Language Models [51.947280241185]
We present KnowledgeEditor, a method that can be used to edit this knowledge.
Besides being computationally efficient, KnowledgeEditor does not require any modifications in LM pre-training.
We show KnowledgeEditor's efficacy with two popular architectures and knowledge-intensive tasks.
arXiv Detail & Related papers (2021-04-16T15:24:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.