Selective Forgetting: Advancing Machine Unlearning Techniques and
Evaluation in Language Models
- URL: http://arxiv.org/abs/2402.05813v1
- Date: Thu, 8 Feb 2024 16:50:01 GMT
- Title: Selective Forgetting: Advancing Machine Unlearning Techniques and
Evaluation in Language Models
- Authors: Lingzhi Wang, Xingshan Zeng, Jinsong Guo, Kam-Fai Wong and Georg
Gottlob
- Abstract summary: This study investigates concerns related to neural models inadvertently retaining personal or sensitive data.
A novel approach is introduced to achieve precise and selective forgetting within language models.
Two innovative evaluation metrics are proposed: Sensitive Information Extraction Likelihood (S-EL) and Sensitive Information Memory Accuracy (S-MA)
- Score: 24.784439330058095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The aim of this study is to investigate Machine Unlearning (MU), a burgeoning
field focused on addressing concerns related to neural models inadvertently
retaining personal or sensitive data. Here, a novel approach is introduced to
achieve precise and selective forgetting within language models. Unlike
previous methodologies that adopt completely opposing training objectives, this
approach aims to mitigate adverse effects on language model performance,
particularly in generation tasks. Furthermore, two innovative evaluation
metrics are proposed: Sensitive Information Extraction Likelihood (S-EL) and
Sensitive Information Memory Accuracy (S-MA), designed to gauge the
effectiveness of sensitive information elimination. To reinforce the forgetting
framework, an effective method for annotating sensitive scopes is presented,
involving both online and offline strategies. The online selection mechanism
leverages language probability scores to ensure computational efficiency, while
the offline annotation entails a robust two-stage process based on Large
Language Models (LLMs).
Related papers
- ReLearn: Unlearning via Learning for Large Language Models [64.2802606302194]
We propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning.
This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation.
Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output.
arXiv Detail & Related papers (2025-02-16T16:31:00Z) - Redefining Machine Unlearning: A Conformal Prediction-Motivated Approach [1.3731623617634434]
We identify critical limitations in existing unlearning metrics and propose enhanced evaluation metrics inspired by conformal prediction.
Our metrics can effectively capture the extent to which ground truth labels are excluded from the prediction set.
We propose an unlearning framework that integrates conformal prediction insights into Carlini & Wagner adversarial attack loss.
arXiv Detail & Related papers (2025-01-31T18:58:43Z) - Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset [94.13848736705575]
We introduce Facial Identity Unlearning Benchmark (FIUBench), a novel VLM unlearning benchmark designed to robustly evaluate the effectiveness of unlearning algorithms.
We apply a two-stage evaluation pipeline that is designed to precisely control the sources of information and their exposure levels.
Through the evaluation of four baseline VLM unlearning algorithms within FIUBench, we find that all methods remain limited in their unlearning performance.
arXiv Detail & Related papers (2024-11-05T23:26:10Z) - CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models Using Discrete Concept [5.345828824625758]
We propose a novel amortized unlearning approach using codebook features and Sparse Autoencoders (SAEs)
By leveraging a bottleneck to decompose the activation space and regulate information flow, our method efficiently unlearns targeted information while preserving the model's performance on unrelated data.
arXiv Detail & Related papers (2024-10-08T10:26:22Z) - Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models [49.043599241803825]
Iterative Contrastive Unlearning (ICU) framework consists of three core components.
A Knowledge Unlearning Induction module removes specific knowledge through an unlearning loss.
A Contrastive Learning Enhancement module to preserve the model's expressive capabilities against the pure unlearning goal.
And an Iterative Unlearning Refinement module that dynamically assess the unlearning extent on specific data pieces and make iterative update.
arXiv Detail & Related papers (2024-07-25T07:09:35Z) - Enhancing Text Classification through LLM-Driven Active Learning and Human Annotation [2.0411082897313984]
This study introduces a novel methodology that integrates human annotators and Large Language Models.
The proposed framework integrates human annotation with the output of LLMs, depending on the model uncertainty levels.
The empirical results show a substantial decrease in the costs associated with data annotation while either maintaining or improving model accuracy.
arXiv Detail & Related papers (2024-06-17T21:45:48Z) - Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA)
Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning.
We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - DUCK: Distance-based Unlearning via Centroid Kinematics [40.2428948628001]
This work introduces a novel unlearning algorithm, denoted as Distance-based Unlearning via Centroid Kinematics (DUCK)
evaluation of the algorithm's performance is conducted across various benchmark datasets.
We also introduce a novel metric, called Adaptive Unlearning Score (AUS), encompassing not only the efficacy of the unlearning process in forgetting target data but also quantifying the performance loss relative to the original model.
arXiv Detail & Related papers (2023-12-04T17:10:25Z) - On Learning Text Style Transfer with Direct Rewards [101.97136885111037]
Lack of parallel corpora makes it impossible to directly train supervised models for the text style transfer task.
We leverage semantic similarity metrics originally used for fine-tuning neural machine translation models.
Our model provides significant gains in both automatic and human evaluation over strong baselines.
arXiv Detail & Related papers (2020-10-24T04:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.