Forgetting before Learning: Utilizing Parametric Arithmetic for
Knowledge Updating in Large Language Models
- URL: http://arxiv.org/abs/2311.08011v2
- Date: Fri, 16 Feb 2024 15:49:42 GMT
- Title: Forgetting before Learning: Utilizing Parametric Arithmetic for
Knowledge Updating in Large Language Models
- Authors: Shiwen Ni, Dingwei Chen, Chengming Li, Xiping Hu, Ruifeng Xu, Min Yang
- Abstract summary: We propose a new paradigm for fine-tuning called F-Learning, which employs parametric arithmetic to facilitate the forgetting of old knowledge and learning of new knowledge.
Experimental results on two publicly available datasets demonstrate that our proposed F-Learning can obviously improve the knowledge updating performance of both full fine-tuning and LoRA fine-tuning.
- Score: 53.52344131257681
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in Large Language Models (LLMs) have showcased their
remarkable capabilities in text understanding and generation. However, even
stronger LLMs are susceptible to acquiring erroneous or obsolete information
from the training corpus. Direct secondary fine-tuning with data containing new
knowledge may be ineffective in updating knowledge due to the conflict between
old and new knowledge. In this paper, we propose a new paradigm for fine-tuning
called F-Learning (Forgetting before Learning), which employs parametric
arithmetic to facilitate the forgetting of old knowledge and learning of new
knowledge. Experimental results on two publicly available datasets demonstrate
that our proposed F-Learning can obviously improve the knowledge updating
performance of both full fine-tuning and LoRA fine-tuning, simultaneously
outperforming the existing baselines in most cases. Moreover, we have also
discovered that forgetting old knowledge by subtracting the parameters of LoRA
can yield a similar effect to subtracting the parameters of full fine-tuning,
and occasionally even surpass it significantly.
Related papers
- Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models [51.20499954955646]
Large language models (LLMs) acquire vast amounts of knowledge from extensive text corpora during the pretraining phase.
In later stages such as fine-tuning and inference, the model may encounter knowledge not covered in the initial training.
We propose a two-stage fine-tuning strategy to improve the model's overall test accuracy and knowledge retention.
arXiv Detail & Related papers (2024-10-08T08:35:16Z) - Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? [33.702498916775426]
We study the impact of new knowledge on the capability of the fine-tuned model to utilize its pre-existing knowledge.
We demonstrate that large language models struggle to acquire new factual knowledge through fine-tuning.
As the examples with new knowledge are eventually learned, they linearly increase the model's tendency to hallucinate.
arXiv Detail & Related papers (2024-05-09T17:00:22Z) - Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning [13.371405067535814]
This paper investigates the effectiveness ofSupervised Fine-Tuning (SFT) as a method for knowledge injection in Large Language Models (LLMs)
We compare different dataset generation strategies -- token-based and fact-based scaling -- to create training data that helps the model learn new information.
Our results show considerable performance improvements in Q&A tasks related to out-of-domain knowledge.
arXiv Detail & Related papers (2024-03-30T01:56:07Z) - InfuserKI: Enhancing Large Language Models with Knowledge Graphs via
Infuser-Guided Knowledge Integration [61.554209059971576]
Large Language Models (LLMs) have shown remarkable open-generation capabilities across diverse domains.
Injecting new knowledge poses the risk of forgetting previously acquired knowledge.
We propose a novel Infuser-Guided Knowledge Integration framework.
arXiv Detail & Related papers (2024-02-18T03:36:26Z) - A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication.
This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches.
We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z) - The Web Can Be Your Oyster for Improving Large Language Models [98.72358969495835]
Large language models (LLMs) encode a large amount of world knowledge.
We consider augmenting LLMs with the large-scale web using search engine.
We present a web-augmented LLM UNIWEB, which is trained over 16 knowledge-intensive tasks in a unified text-to-text format.
arXiv Detail & Related papers (2023-05-18T14:20:32Z) - Adaptively Integrated Knowledge Distillation and Prediction Uncertainty
for Continual Learning [71.43841235954453]
Current deep learning models often suffer from catastrophic forgetting of old knowledge when continually learning new knowledge.
Existing strategies to alleviate this issue often fix the trade-off between keeping old knowledge (stability) and learning new knowledge (plasticity)
arXiv Detail & Related papers (2023-01-18T05:36:06Z) - Unsupervised Pre-training with Structured Knowledge for Improving
Natural Language Inference [22.648536283569747]
We propose models that leverage structured knowledge in different components of pre-trained models.
Our results show that the proposed models perform better than previous BERT-based state-of-the-art models.
arXiv Detail & Related papers (2021-09-08T21:28:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.