Protecting Privacy Through Approximating Optimal Parameters for Sequence Unlearning in Language Models
- URL: http://arxiv.org/abs/2406.14091v1
- Date: Thu, 20 Jun 2024 08:12:49 GMT
- Title: Protecting Privacy Through Approximating Optimal Parameters for Sequence Unlearning in Language Models
- Authors: Dohyun Lee, Daniel Rim, Minseok Choi, Jaegul Choo,
- Abstract summary: Language models (LMs) are potentially vulnerable to extraction attacks, which represent a significant privacy risk.
We propose Privacy Protection via Optimal Parameters (POP), a novel unlearning method that effectively forgets the target token sequences from the pretrained LM.
POP exhibits remarkable retention performance post-unlearning across 9 classification and 4 dialogue benchmarks, outperforming the state-of-the-art by a large margin.
- Score: 37.172662930947446
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although language models (LMs) demonstrate exceptional capabilities on various tasks, they are potentially vulnerable to extraction attacks, which represent a significant privacy risk. To mitigate the privacy concerns of LMs, machine unlearning has emerged as an important research area, which is utilized to induce the LM to selectively forget about some of its training data. While completely retraining the model will guarantee successful unlearning and privacy assurance, it is impractical for LMs, as it would be time-consuming and resource-intensive. Prior works efficiently unlearn the target token sequences, but upon subsequent iterations, the LM displays significant degradation in performance. In this work, we propose Privacy Protection via Optimal Parameters (POP), a novel unlearning method that effectively forgets the target token sequences from the pretrained LM by applying optimal gradient updates to the parameters. Inspired by the gradient derivation of complete retraining, we approximate the optimal training objective that successfully unlearns the target sequence while retaining the knowledge from the rest of the training data. Experimental results demonstrate that POP exhibits remarkable retention performance post-unlearning across 9 classification and 4 dialogue benchmarks, outperforming the state-of-the-art by a large margin. Furthermore, we introduce Remnant Memorization Accuracy that quantifies privacy risks based on token likelihood and validate its effectiveness through both qualitative and quantitative analyses.
Related papers
- Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset [94.13848736705575]
We introduce Facial Identity Unlearning Benchmark (FIUBench), a novel VLM unlearning benchmark designed to robustly evaluate the effectiveness of unlearning algorithms.
We apply a two-stage evaluation pipeline that is designed to precisely control the sources of information and their exposure levels.
Through the evaluation of four baseline VLM unlearning algorithms within FIUBench, we find that all methods remain limited in their unlearning performance.
arXiv Detail & Related papers (2024-11-05T23:26:10Z) - Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models [94.39278422567955]
Fine-tuning large language models (LLMs) on human preferences has proven successful in enhancing their capabilities.
However, ensuring the safety of LLMs during the fine-tuning remains a critical concern.
We propose a supervised learning framework called Bi-Factorial Preference Optimization (BFPO) to address this issue.
arXiv Detail & Related papers (2024-08-27T17:31:21Z) - Towards Robust and Cost-Efficient Knowledge Unlearning for Large Language Models [25.91643745340183]
Large Language Models (LLMs) have demonstrated strong reasoning and memorization capabilities via pretraining on massive textual corpora.
This poses risk of privacy and copyright violations, highlighting the need for efficient machine unlearning methods.
We propose two novel techniques for robust and efficient unlearning for LLMs.
arXiv Detail & Related papers (2024-08-13T04:18:32Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Learn When (not) to Trust Language Models: A Privacy-Centric Adaptive Model-Aware Approach [23.34505448257966]
Retrieval-augmented large language models (LLMs) have been remarkably competent in various NLP tasks.
Previous work has proposed to determine when to do/skip the retrieval in a data-aware manner by analyzing the LLMs' pretraining data.
These data-aware methods pose privacy risks and memory limitations, especially when requiring access to sensitive or extensive pretraining data.
We hypothesize that token embeddings are able to capture the model's intrinsic knowledge, which offers a safer and more straightforward way to judge the need for retrieval without the privacy risks associated with accessing pre-training data.
arXiv Detail & Related papers (2024-04-04T15:21:22Z) - Augmenting Unsupervised Reinforcement Learning with Self-Reference [63.68018737038331]
Humans possess the ability to draw on past experiences explicitly when learning new tasks.
We propose the Self-Reference (SR) approach, an add-on module explicitly designed to leverage historical information.
Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark.
arXiv Detail & Related papers (2023-11-16T09:07:34Z) - Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised
Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches.
We present UPET, a novel Uncertainty-aware self-Training framework.
We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z) - Knowledge Unlearning for Mitigating Privacy Risks in Language Models [31.322818016245087]
We propose knowledge unlearning as an alternative method to reduce privacy risks for language models.
We show that simply applying the unlikelihood training objective to target token sequences is effective at forgetting them.
We show that unlearning can give a stronger empirical privacy guarantee in scenarios where the data vulnerable to extraction attacks are known a priori.
arXiv Detail & Related papers (2022-10-04T10:18:11Z) - Semi-Supervised Off Policy Reinforcement Learning [3.48396189165489]
Health-outcome information is often not well coded but rather embedded in clinical notes.
We propose a semi-supervised learning (SSL) approach that efficiently leverages a small sized labeled data with true outcome observed, and a large unlabeled data with outcome surrogates.
Our method is at least as efficient as the supervised approach, and moreover safe as it robust to mis-specification of the imputation models.
arXiv Detail & Related papers (2020-12-09T00:59:12Z) - Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less
Forgetting [66.45372974713189]
We propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks.
Experiments show that our method achieves state-of-the-art performance on the GLUE benchmark.
We provide open-source RecAdam, which integrates the proposed mechanisms into Adam to facility the NLP community.
arXiv Detail & Related papers (2020-04-27T08:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.