Related papers: FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge

FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge

URL: http://arxiv.org/abs/2502.19207v1
Date: Wed, 26 Feb 2025 15:11:03 GMT
Title: FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge
Authors: Nakyeong Yang, Minsung Kim, Seunghyun Yoon, Joongbo Shin, Kyomin Jung,
Abstract summary: We define a new concept called superficial unlearning, which refers to the phenomenon where an unlearning method fails to erase interconnected knowledge.<n>Based on the definition, we introduce a new benchmark, FaithUn, to analyze and evaluate the faithfulness of unlearning in real-world knowledge QA settings.<n>We propose a novel unlearning method, KLUE, which updates only knowledge-related neurons to achieve faithful unlearning.
Score: 24.858928681280634
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Various studies have attempted to remove sensitive or private knowledge from a language model to prevent its unauthorized exposure. However, prior studies have overlooked the complex and interconnected nature of knowledge, where related knowledge must be carefully examined. Specifically, they have failed to evaluate whether an unlearning method faithfully erases interconnected knowledge that should be removed, retaining knowledge that appears relevant but exists in a completely different context. To resolve this problem, we first define a new concept called superficial unlearning, which refers to the phenomenon where an unlearning method either fails to erase the interconnected knowledge it should remove or unintentionally erases irrelevant knowledge. Based on the definition, we introduce a new benchmark, FaithUn, to analyze and evaluate the faithfulness of unlearning in real-world knowledge QA settings. Furthermore, we propose a novel unlearning method, KLUE, which updates only knowledge-related neurons to achieve faithful unlearning. KLUE identifies knowledge neurons using an explainability method and updates only those neurons using selected unforgotten samples. Experimental results demonstrate that widely-used unlearning methods fail to ensure faithful unlearning, while our method shows significant effectiveness in real-world QA unlearning.

Related papers

Unlearning through Knowledge Overwriting: Reversible Federated Unlearning via Selective Sparse Adapter [35.65566527544619]
Federated learning is a promising paradigm for privacy-preserving collaborative model training. We propose FUSED, which first identifies critical layers by analyzing each layer's sensitivity to knowledge. adapters are trained without altering the original parameters, overwriting the unlearning knowledge with the remaining knowledge.
arXiv Detail & Related papers (2025-02-28T04:35:26Z)
To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models [39.39428450239399]
Large Language Models (LLMs) trained on extensive corpora inevitably retain sensitive data, such as personal privacy information and copyrighted material. Recent advancements in knowledge unlearning involve updating LLM parameters to erase specific knowledge. We introduce KnowUnDo to evaluate if the unlearning process inadvertently erases essential knowledge.
arXiv Detail & Related papers (2024-07-02T03:34:16Z)
UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI [50.61495097098296]
We revisit the paradigm in which unlearning is used for Large Language Models (LLMs) We introduce a concept of ununlearning, where unlearned knowledge gets reintroduced in-context. We argue that content filtering for impermissible knowledge will be required and even exact unlearning schemes are not enough for effective content regulation.
arXiv Detail & Related papers (2024-06-27T10:24:35Z)
Challenges with unsupervised LLM knowledge discovery [15.816138136030705]
We show that existing unsupervised methods on large language model (LLM) activations do not discover knowledge. The idea behind unsupervised knowledge elicitation is that knowledge satisfies a consistency structure, which can be used to discover knowledge.
arXiv Detail & Related papers (2023-12-15T18:49:43Z)
Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges [11.228131492745842]
Large language models (LLMs) have spurred a new research paradigm in natural language processing. Despite their excellent capability in knowledge-based question answering and reasoning, their potential to retain faulty or even harmful knowledge poses risks of malicious application. Knowledge unlearning, derived from analogous studies on machine unlearning, presents a promising avenue to address this concern.
arXiv Detail & Related papers (2023-11-27T12:37:51Z)
Adaptively Integrated Knowledge Distillation and Prediction Uncertainty for Continual Learning [71.43841235954453]
Current deep learning models often suffer from catastrophic forgetting of old knowledge when continually learning new knowledge. Existing strategies to alleviate this issue often fix the trade-off between keeping old knowledge (stability) and learning new knowledge (plasticity)
arXiv Detail & Related papers (2023-01-18T05:36:06Z)
Anti-Retroactive Interference for Lifelong Learning [65.50683752919089]
We design a paradigm for lifelong learning based on meta-learning and associative mechanism of the brain. It tackles the problem from two aspects: extracting knowledge and memorizing knowledge. It is theoretically analyzed that the proposed learning paradigm can make the models of different tasks converge to the same optimum.
arXiv Detail & Related papers (2022-08-27T09:27:36Z)
A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA [67.75989848202343]
This paper presents a unified end-to-end retriever-reader framework towards knowledge-based VQA. We shed light on the multi-modal implicit knowledge from vision-language pre-training models to mine its potential in knowledge reasoning. Our scheme is able to not only provide guidance for knowledge retrieval, but also drop these instances potentially error-prone towards question answering.
arXiv Detail & Related papers (2022-06-30T02:35:04Z)
Incremental Knowledge Based Question Answering [52.041815783025186]
We propose a new incremental KBQA learning framework that can progressively expand learning capacity as humans do. Specifically, it comprises a margin-distilled loss and a collaborative selection method, to overcome the catastrophic forgetting problem. The comprehensive experiments demonstrate its effectiveness and efficiency when working with the evolving knowledge base.
arXiv Detail & Related papers (2021-01-18T09:03:38Z)
Towards a Universal Continuous Knowledge Base [49.95342223987143]
We propose a method for building a continuous knowledge base that can store knowledge imported from multiple neural networks. Experiments on text classification show promising results. We import the knowledge from multiple models to the knowledge base, from which the fused knowledge is exported back to a single model.
arXiv Detail & Related papers (2020-12-25T12:27:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.