Selective Forgetting: Advancing Machine Unlearning Techniques and
Evaluation in Language Models
- URL: http://arxiv.org/abs/2402.05813v1
- Date: Thu, 8 Feb 2024 16:50:01 GMT
- Title: Selective Forgetting: Advancing Machine Unlearning Techniques and
Evaluation in Language Models
- Authors: Lingzhi Wang, Xingshan Zeng, Jinsong Guo, Kam-Fai Wong and Georg
Gottlob
- Abstract summary: This study investigates concerns related to neural models inadvertently retaining personal or sensitive data.
A novel approach is introduced to achieve precise and selective forgetting within language models.
Two innovative evaluation metrics are proposed: Sensitive Information Extraction Likelihood (S-EL) and Sensitive Information Memory Accuracy (S-MA)
- Score: 24.784439330058095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The aim of this study is to investigate Machine Unlearning (MU), a burgeoning
field focused on addressing concerns related to neural models inadvertently
retaining personal or sensitive data. Here, a novel approach is introduced to
achieve precise and selective forgetting within language models. Unlike
previous methodologies that adopt completely opposing training objectives, this
approach aims to mitigate adverse effects on language model performance,
particularly in generation tasks. Furthermore, two innovative evaluation
metrics are proposed: Sensitive Information Extraction Likelihood (S-EL) and
Sensitive Information Memory Accuracy (S-MA), designed to gauge the
effectiveness of sensitive information elimination. To reinforce the forgetting
framework, an effective method for annotating sensitive scopes is presented,
involving both online and offline strategies. The online selection mechanism
leverages language probability scores to ensure computational efficiency, while
the offline annotation entails a robust two-stage process based on Large
Language Models (LLMs).
Related papers
- CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models Using Discrete Concept [5.345828824625758]
We propose a novel amortized unlearning approach using codebook features and Sparse Autoencoders (SAEs)
By leveraging a bottleneck to decompose the activation space and regulate information flow, our method efficiently unlearns targeted information while preserving the model's performance on unrelated data.
arXiv Detail & Related papers (2024-10-08T10:26:22Z) - Towards Robust and Cost-Efficient Knowledge Unlearning for Large Language Models [25.91643745340183]
Large Language Models (LLMs) have demonstrated strong reasoning and memorization capabilities via pretraining on massive textual corpora.
This poses risk of privacy and copyright violations, highlighting the need for efficient machine unlearning methods.
We propose two novel techniques for robust and efficient unlearning for LLMs.
arXiv Detail & Related papers (2024-08-13T04:18:32Z) - Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models [49.043599241803825]
Iterative Contrastive Unlearning (ICU) framework consists of three core components.
A Knowledge Unlearning Induction module removes specific knowledge through an unlearning loss.
A Contrastive Learning Enhancement module to preserve the model's expressive capabilities against the pure unlearning goal.
And an Iterative Unlearning Refinement module that dynamically assess the unlearning extent on specific data pieces and make iterative update.
arXiv Detail & Related papers (2024-07-25T07:09:35Z) - Silver Linings in the Shadows: Harnessing Membership Inference for Machine Unlearning [7.557226714828334]
We present a novel unlearning mechanism designed to remove the impact of specific data samples from a neural network.
In achieving this goal, we crafted a novel loss function tailored to eliminate privacy-sensitive information from weights and activation values of the target model.
Our results showcase the superior performance of our approach in terms of unlearning efficacy and latency as well as the fidelity of the primary task.
arXiv Detail & Related papers (2024-07-01T00:20:26Z) - Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA)
Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning.
We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - DPP-Based Adversarial Prompt Searching for Lanugage Models [56.73828162194457]
Auto-regressive Selective Replacement Ascent (ASRA) is a discrete optimization algorithm that selects prompts based on both quality and similarity with determinantal point process (DPP)
Experimental results on six different pre-trained language models demonstrate the efficacy of ASRA for eliciting toxic content.
arXiv Detail & Related papers (2024-03-01T05:28:06Z) - Machine unlearning through fine-grained model parameters perturbation [26.653596302257057]
We propose fine-grained Top-K and Random-k parameters perturbed inexact machine unlearning strategies.
We also tackle the challenge of evaluating the effectiveness of machine unlearning.
arXiv Detail & Related papers (2024-01-09T07:14:45Z) - DUCK: Distance-based Unlearning via Centroid Kinematics [40.2428948628001]
This work introduces a novel unlearning algorithm, denoted as Distance-based Unlearning via Centroid Kinematics (DUCK)
evaluation of the algorithm's performance is conducted across various benchmark datasets.
We also introduce a novel metric, called Adaptive Unlearning Score (AUS), encompassing not only the efficacy of the unlearning process in forgetting target data but also quantifying the performance loss relative to the original model.
arXiv Detail & Related papers (2023-12-04T17:10:25Z) - Information-Theoretic Odometry Learning [83.36195426897768]
We propose a unified information theoretic framework for learning-motivated methods aimed at odometry estimation.
The proposed framework provides an elegant tool for performance evaluation and understanding in information-theoretic language.
arXiv Detail & Related papers (2022-03-11T02:37:35Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.