Enhance Lifelong Model Editing with Continuous Data-Adapter Association
- URL: http://arxiv.org/abs/2408.11869v1
- Date: Mon, 19 Aug 2024 02:27:00 GMT
- Title: Enhance Lifelong Model Editing with Continuous Data-Adapter Association
- Authors: Jiaang Li, Quan Wang, Zhongnan Wang, Yongdong Zhang, Zhendong Mao,
- Abstract summary: Large language models (LLMs) require model editing to efficiently update specific knowledge within them and avoid factual errors.
Current approaches manage sequential edits by freezing original parameters and allocating new adapters for each knowledge modification.
We propose ELDER, textbfEnhancing textbfLifelong motextbfDel textbfEditing with mixtutextbfRe of Low-Rank Adapter (LoRA)
- Score: 55.697627106315004
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large language models (LLMs) require model editing to efficiently update specific knowledge within them and avoid factual errors. Most model editing methods are solely designed for single-time use and lead to a significant forgetting effect after sequential edits over time, referred to as lifelong editing. Current approaches manage sequential edits by freezing original parameters and allocating new adapters for each knowledge modification. However, these methods lack robustness to minor input variations. To address this challenge, we propose ELDER, \textbf{E}nhancing \textbf{L}ifelong mo\textbf{D}el \textbf{E}diting with mixtu\textbf{R}e of Low-Rank Adapter (LoRA). ELDER is an adaptive approach that integrates multiple LoRAs through a router network. It learns to create a continuous and smooth association between data and adapters, thereby enhancing robustness and generalization to semantically equivalent inputs. Additionally, we introduce a novel loss to help learn associations between adapter allocations and edit semantics. A deferral mechanism is also proposed to retain the original LLM capabilities post-edit. Extensive experiments on GPT-2 XL and LLaMA2-7B demonstrate that ELDER effectively edits models in the lifelong setting and exhibits strong scalability, while retaining LLM's general abilities on downstream tasks.
Related papers
- O-Edit: Orthogonal Subspace Editing for Language Model Sequential Editing [0.0]
Large language models (LLMs) acquire knowledge during pre-training, but over time, this knowledge may become incorrect or outdated, necessitating updates after training.
We propose Orthogonal Subspace Editing, O-Edit. This algorithmizes the direction of each knowledge update, minimizing interference between successive updates and reducing the impact of new updates on unrelated knowledge.
It can perform thousands of edits on mainstream LLMs, achieving an average performance improvement that is 4.2 times better than existing methods while effectively preserving the model's performance on downstream tasks, all with minimal additional parameter overhead.
arXiv Detail & Related papers (2024-10-15T10:16:45Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - Adaptive Adapter Routing for Long-Tailed Class-Incremental Learning [55.384428765798496]
New data exhibits a long-tailed distribution, such as e-commerce platform reviews.
This necessitates continuous model learning imbalanced data without forgetting.
We introduce AdaPtive Adapter RouTing (APART) as an exemplar-free solution for LTCIL.
arXiv Detail & Related papers (2024-09-11T17:52:00Z) - LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models [30.831866499812925]
Large language models (LLMs) require continual knowledge updates to stay abreast of the ever-changing world facts.
We introduce LEMoE, an advanced Mixture of Experts (MoE) adaptor for lifelong model editing.
arXiv Detail & Related papers (2024-06-28T16:17:41Z) - DAFNet: Dynamic Auxiliary Fusion for Sequential Model Editing in Large Language Models [32.598670876662375]
A Dynamic Auxiliary Fusion Network (DAFNet) is designed to enhance the semantic interaction among the factual knowledge within the entire sequence.
DAFNet significantly outperforms strong baselines in single-turn and sequential editing.
arXiv Detail & Related papers (2024-05-31T02:56:49Z) - The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse [58.0132400208411]
Even a single edit can trigger model collapse, manifesting as significant performance degradation in various benchmark tasks.
benchmarking Large Language Models after each edit is impractically time-consuming and resource-intensive.
We have utilized GPT-3.5 to develop a new dataset, HardEdit, based on hard cases.
arXiv Detail & Related papers (2024-02-15T01:50:38Z) - SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering [17.20346072074533]
Recent model editing is a promising technique for efficiently updating a small amount of knowledge of large language models (LLMs)
We propose a detachable and expandable Subject Word Embedding Altering (SWEA) framework, which finds the editing embeddings through token-level matching.
We demonstrate the overall state-of-the-art (SOTA) performance of SWEA$oplus$OS on the textscCounterFact and zsRE datasets.
arXiv Detail & Related papers (2024-01-31T13:08:45Z) - Massive Editing for Large Language Models via Meta Learning [27.972194696587813]
Large language models (LLMs) have enabled learning knowledge from the pre-training corpora, but the acquired knowledge may be fundamentally incorrect or outdated over time.
We propose the MAssive Language Model Editing Network (MALMEN), which formulates the parameter shift aggregation as the least square problem.
Our method is evaluated by editing up to thousands of facts on LMs with different architectures, i.e., BERT-base, GPT-2, T5-XL (2.8B), and GPT-J (6B)
arXiv Detail & Related papers (2023-11-08T13:03:06Z) - Aging with GRACE: Lifelong Model Editing with Discrete Key-Value
Adaptors [53.819805242367345]
We propose GRACE, a lifelong model editing method, which implements spot-fixes on streaming errors of a deployed model.
GRACE writes new mappings into a pre-trained model's latent space, creating a discrete, local codebook of edits without altering model weights.
Our experiments on T5, BERT, and GPT models show GRACE's state-of-the-art performance in making and retaining edits, while generalizing to unseen inputs.
arXiv Detail & Related papers (2022-11-20T17:18:22Z) - Memory-Based Model Editing at Scale [102.28475739907498]
Existing model editors struggle to accurately model an edit's intended scope.
We propose Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model (SERAC)
SERAC stores edits in an explicit memory and learns to reason over them to modulate the base model's predictions as needed.
arXiv Detail & Related papers (2022-06-13T23:40:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.