Semantic are Beacons: A Semantic Perspective for Unveiling Parameter-Efficient Fine-Tuning in Knowledge Learning
- URL: http://arxiv.org/abs/2405.18292v1
- Date: Tue, 28 May 2024 15:47:11 GMT
- Title: Semantic are Beacons: A Semantic Perspective for Unveiling Parameter-Efficient Fine-Tuning in Knowledge Learning
- Authors: Renzhi Wang, Piji Li,
- Abstract summary: We propose a semantic perspective to investigate the reasons behind PEFT's limitations in knowledge learning task.
PEFT presents a notable risk of pushing the model away from the intended knowledge target.
We introduce a data filtering strategy to exclude data that is detrimental to knowledge learning and a re-weighted learning strategy to make the model attentive to semantic distance.
- Score: 30.831866499812925
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of Large Language Models (LLMs) to various downstream applications. However, the effectiveness of the PEFT diminishes notably when downstream tasks require accurate learning of factual knowledge. In this paper, we adopt a semantic perspective to investigate this phenomenon, uncovering the reasons behind PEFT's limitations in knowledge learning task. Our findings reveal that: (1) PEFT presents a notable risk of pushing the model away from the intended knowledge target; (2) multiple knowledge interfere with each other, and such interference suppresses the learning and expression of knowledge features. Based on these insights, we introduce a data filtering strategy to exclude data that is detrimental to knowledge learning and a re-weighted learning strategy to make the model attentive to semantic distance during knowledge learning. Experimental results demonstrate the effectiveness of the proposed method on open-source large language model, further validate the semantic challenge in PEFT, thus paving the way for future research.
Related papers
- Dissecting Fine-Tuning Unlearning in Large Language Models [12.749301272512222]
Fine-tuning-based unlearning methods prevail for preventing harmful, sensitive, or copyrighted information within large language models.
However, the true effectiveness of these methods is unclear.
In this work, we delve into the limitations of fine-tuning-based unlearning through activation patching and restoration experiments.
arXiv Detail & Related papers (2024-10-09T06:58:09Z) - Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration [74.09687562334682]
We introduce a novel training data attribution method called Debias and Denoise Attribution (DDA)
Our method significantly outperforms existing approaches, achieving an averaged AUC of 91.64%.
DDA exhibits strong generality and scalability across various sources and different-scale models like LLaMA2, QWEN2, and Mistral.
arXiv Detail & Related papers (2024-10-02T07:14:26Z) - UNLEARN Efficient Removal of Knowledge in Large Language Models [1.9797215742507548]
This paper proposes a novel method to achieve this objective called UNLEARN.
The approach builds upon subspace methods to identify and specifically target the removal of knowledge without adversely affecting other knowledge in the LLM.
Results demonstrate 96% of targeted knowledge can be forgotten while maintaining performance on other knowledge within 2.5% of the original model.
arXiv Detail & Related papers (2024-08-08T00:53:31Z) - Large Language Models are Limited in Out-of-Context Knowledge Reasoning [65.72847298578071]
Large Language Models (LLMs) possess extensive knowledge and strong capabilities in performing in-context reasoning.
This paper focuses on a significant aspect of out-of-context reasoning: Out-of-Context Knowledge Reasoning (OCKR), which is to combine multiple knowledge to infer new knowledge.
arXiv Detail & Related papers (2024-06-11T15:58:59Z) - Enhancing Deep Knowledge Tracing via Diffusion Models for Personalized Adaptive Learning [1.2248793682283963]
This study aims to tackle data shortage issues in student learning records to enhance DKT performance for personalized adaptive learning (PAL)
It employs TabDDPM, a diffusion model, to generate synthetic educational records to augment training data for enhancing DKT.
The experimental results demonstrate that the AI-generated data by TabDDPM significantly improves DKT performance.
arXiv Detail & Related papers (2024-04-25T00:23:20Z) - Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges [11.228131492745842]
Large language models (LLMs) have spurred a new research paradigm in natural language processing.
Despite their excellent capability in knowledge-based question answering and reasoning, their potential to retain faulty or even harmful knowledge poses risks of malicious application.
Knowledge unlearning, derived from analogous studies on machine unlearning, presents a promising avenue to address this concern.
arXiv Detail & Related papers (2023-11-27T12:37:51Z) - Forgetting before Learning: Utilizing Parametric Arithmetic for
Knowledge Updating in Large Language Models [53.52344131257681]
We propose a new paradigm for fine-tuning called F-Learning, which employs parametric arithmetic to facilitate the forgetting of old knowledge and learning of new knowledge.
Experimental results on two publicly available datasets demonstrate that our proposed F-Learning can obviously improve the knowledge updating performance of both full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2023-11-14T09:12:40Z) - Thrust: Adaptively Propels Large Language Models with External Knowledge [58.72867916604562]
Large-scale pre-trained language models (PTLMs) are shown to encode rich knowledge in their model parameters.
The inherent knowledge in PTLMs can be opaque or static, making external knowledge necessary.
We propose the instance-level adaptive propulsion of external knowledge (IAPEK), where we only conduct the retrieval when necessary.
arXiv Detail & Related papers (2023-07-19T20:16:46Z) - A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA [67.75989848202343]
This paper presents a unified end-to-end retriever-reader framework towards knowledge-based VQA.
We shed light on the multi-modal implicit knowledge from vision-language pre-training models to mine its potential in knowledge reasoning.
Our scheme is able to not only provide guidance for knowledge retrieval, but also drop these instances potentially error-prone towards question answering.
arXiv Detail & Related papers (2022-06-30T02:35:04Z) - Ontology-enhanced Prompt-tuning for Few-shot Learning [41.51144427728086]
Few-shot Learning is aimed to make predictions based on a limited number of samples.
Structured data such as knowledge graphs and ontology libraries has been leveraged to benefit the few-shot setting in various tasks.
arXiv Detail & Related papers (2022-01-27T05:41:36Z) - Knowledge-driven Active Learning [70.37119719069499]
Active learning strategies aim at minimizing the amount of labelled data required to train a Deep Learning model.
Most active strategies are based on uncertain sample selection, and even often restricted to samples lying close to the decision boundary.
Here we propose to take into consideration common domain-knowledge and enable non-expert users to train a model with fewer samples.
arXiv Detail & Related papers (2021-10-15T06:11:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.