Related papers: Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models

Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models

URL: http://arxiv.org/abs/2311.08011v2
Date: Fri, 16 Feb 2024 15:49:42 GMT
Title: Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models
Authors: Shiwen Ni, Dingwei Chen, Chengming Li, Xiping Hu, Ruifeng Xu, Min Yang
Abstract summary: We propose a new paradigm for fine-tuning called F-Learning, which employs parametric arithmetic to facilitate the forgetting of old knowledge and learning of new knowledge. Experimental results on two publicly available datasets demonstrate that our proposed F-Learning can obviously improve the knowledge updating performance of both full fine-tuning and LoRA fine-tuning.
Score: 53.52344131257681
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in Large Language Models (LLMs) have showcased their remarkable capabilities in text understanding and generation. However, even stronger LLMs are susceptible to acquiring erroneous or obsolete information from the training corpus. Direct secondary fine-tuning with data containing new knowledge may be ineffective in updating knowledge due to the conflict between old and new knowledge. In this paper, we propose a new paradigm for fine-tuning called F-Learning (Forgetting before Learning), which employs parametric arithmetic to facilitate the forgetting of old knowledge and learning of new knowledge. Experimental results on two publicly available datasets demonstrate that our proposed F-Learning can obviously improve the knowledge updating performance of both full fine-tuning and LoRA fine-tuning, simultaneously outperforming the existing baselines in most cases. Moreover, we have also discovered that forgetting old knowledge by subtracting the parameters of LoRA can yield a similar effect to subtracting the parameters of full fine-tuning, and occasionally even surpass it significantly.

Related papers

How new data permeates LLM knowledge and how to dilute it [19.96863816288517]
Large language models learn and continually learn through the accumulation of gradient-based updates. We demonstrate that when learning new information, LLMs exhibit a "priming" effect: learning a new fact can cause the model to inappropriately apply that knowledge in unrelated contexts. We show that the degree of priming after learning new information can be predicted by measuring the token probability of key words before learning.
arXiv Detail & Related papers (2025-04-13T11:25:04Z)
Efficient Tuning of Large Language Models for Knowledge-Grounded Dialogue Generation [21.52726424882653]
We introduce KEDiT, an efficient method for fine-tuning large language models for knowledge-grounded dialogue generation. KEDiT operates in two main phases: first, it employs an information bottleneck to compress retrieved knowledge into learnable parameters, retaining essential information while minimizing computational overhead. experimental results on the Wizard of Wikipedia and a newly constructed PubMed-Dialog dataset demonstrate that KEDiT excels in generating contextually relevant and informative responses.
arXiv Detail & Related papers (2025-04-10T13:54:36Z)
Effective LLM Knowledge Learning via Model Generalization [73.16975077770765]
Large language models (LLMs) are trained on enormous documents that contain extensive world knowledge. It is still not well-understood how knowledge is acquired via autoregressive pre-training. In this paper, we focus on understanding and improving LLM knowledge learning.
arXiv Detail & Related papers (2025-03-05T17:56:20Z)
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? [55.33467849079774]
Low-rank adaptation (LoRA) is a popular and efficient training technique for updating or domain-specific adaptation of Large Language Models. We investigate how new facts can be incorporated into the LLM using LoRA without compromising the previously learned knowledge.
arXiv Detail & Related papers (2025-02-20T12:31:03Z)
Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models [51.20499954955646]
Large language models (LLMs) acquire vast amounts of knowledge from extensive text corpora during the pretraining phase. In later stages such as fine-tuning and inference, the model may encounter knowledge not covered in the initial training. We propose a two-stage fine-tuning strategy to improve the model's overall test accuracy and knowledge retention.
arXiv Detail & Related papers (2024-10-08T08:35:16Z)
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? [33.702498916775426]
We study the impact of new knowledge on the capability of the fine-tuned model to utilize its pre-existing knowledge. We demonstrate that large language models struggle to acquire new factual knowledge through fine-tuning. As the examples with new knowledge are eventually learned, they linearly increase the model's tendency to hallucinate.
arXiv Detail & Related papers (2024-05-09T17:00:22Z)
Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning [13.371405067535814]
This paper investigates the effectiveness ofSupervised Fine-Tuning (SFT) as a method for knowledge injection in Large Language Models (LLMs) We compare different dataset generation strategies -- token-based and fact-based scaling -- to create training data that helps the model learn new information. Our results show considerable performance improvements in Q&A tasks related to out-of-domain knowledge.
arXiv Detail & Related papers (2024-03-30T01:56:07Z)
InfuserKI: Enhancing Large Language Models with Knowledge Graphs via Infuser-Guided Knowledge Integration [61.554209059971576]
Large Language Models (LLMs) have shown remarkable open-generation capabilities across diverse domains. Injecting new knowledge poses the risk of forgetting previously acquired knowledge. We propose a novel Infuser-Guided Knowledge Integration framework.
arXiv Detail & Related papers (2024-02-18T03:36:26Z)
A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches. We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z)
The Web Can Be Your Oyster for Improving Large Language Models [98.72358969495835]
Large language models (LLMs) encode a large amount of world knowledge. We consider augmenting LLMs with the large-scale web using search engine. We present a web-augmented LLM UNIWEB, which is trained over 16 knowledge-intensive tasks in a unified text-to-text format.
arXiv Detail & Related papers (2023-05-18T14:20:32Z)
Adaptively Integrated Knowledge Distillation and Prediction Uncertainty for Continual Learning [71.43841235954453]
Current deep learning models often suffer from catastrophic forgetting of old knowledge when continually learning new knowledge. Existing strategies to alleviate this issue often fix the trade-off between keeping old knowledge (stability) and learning new knowledge (plasticity)
arXiv Detail & Related papers (2023-01-18T05:36:06Z)
Unsupervised Pre-training with Structured Knowledge for Improving Natural Language Inference [22.648536283569747]
We propose models that leverage structured knowledge in different components of pre-trained models. Our results show that the proposed models perform better than previous BERT-based state-of-the-art models.
arXiv Detail & Related papers (2021-09-08T21:28:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.