Distribution-Level Feature Distancing for Machine Unlearning: Towards a Better Trade-off Between Model Utility and Forgetting
- URL: http://arxiv.org/abs/2409.14747v2
- Date: Tue, 24 Sep 2024 05:27:24 GMT
- Title: Distribution-Level Feature Distancing for Machine Unlearning: Towards a Better Trade-off Between Model Utility and Forgetting
- Authors: Dasol Choi, Dongbin Na,
- Abstract summary: Recent studies have presented various machine unlearning algorithms to make a trained model unlearn the data to be forgotten.
We propose Distribution-Level Feature Distancing (DLFD), a novel method that efficiently forgets instances while preventing correlation collapse.
Our method synthesizes data samples so that the generated data distribution is far from the distribution of samples being forgotten in the feature space.
- Score: 4.220336689294245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the explosive growth of deep learning applications, the right to be forgotten has become increasingly in demand in various AI industries. For example, given a facial recognition system, some individuals may wish to remove images that might have been used in the training phase from the trained model. Unfortunately, modern deep neural networks sometimes unexpectedly leak personal identities. Recent studies have presented various machine unlearning algorithms to make a trained model unlearn the data to be forgotten. While these methods generally perform well in terms of forgetting scores, we have found that an unexpected modelutility drop can occur. This phenomenon, which we term correlation collapse, happens when the machine unlearning algorithms reduce the useful correlation between image features and the true label. To address this challenge, we propose Distribution-Level Feature Distancing (DLFD), a novel method that efficiently forgets instances while preventing correlation collapse. Our method synthesizes data samples so that the generated data distribution is far from the distribution of samples being forgotten in the feature space, achieving effective results within a single training epoch. Through extensive experiments on facial recognition datasets, we demonstrate that our approach significantly outperforms state-of-the-art machine unlearning methods.
Related papers
- When to Forget? Complexity Trade-offs in Machine Unlearning [23.507879460531264]
Machine Unlearning (MU) aims at removing the influence of specific data points from a trained model.
We analyze the efficiency of unlearning methods and establish the first upper and lower bounds on minimax times for this problem.
We provide a phase diagram for the unlearning complexity ratio -- a novel metric that compares the computational cost of the best unlearning method to full model retraining.
arXiv Detail & Related papers (2025-02-24T16:56:27Z) - Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - Model Integrity when Unlearning with T2I Diffusion Models [11.321968363411145]
We propose approximate Machine Unlearning algorithms to reduce the generation of specific types of images, characterized by samples from a forget distribution''
We then propose unlearning algorithms that demonstrate superior effectiveness in preserving model integrity compared to existing baselines.
arXiv Detail & Related papers (2024-11-04T13:15:28Z) - Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest.
Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z) - Accurate Forgetting for All-in-One Image Restoration Model [3.367455972998532]
Currently, a low-cost scheme called Machine Unlearning forgets the private data remembered in the model.
Inspired by this, we try to use this concept to bridge the gap between the fields of image restoration and security.
arXiv Detail & Related papers (2024-09-01T10:14:16Z) - An Information Theoretic Approach to Machine Unlearning [45.600917449314444]
Key challenge in unlearning is forgetting the necessary data in a timely manner, while preserving model performance.
In this work, we address the zero-shot unlearning scenario, whereby an unlearning algorithm must be able to remove data given only a trained model and the data to be forgotten.
We derive a simple but principled zero-shot unlearning method based on the geometry of the model.
arXiv Detail & Related papers (2024-02-02T13:33:30Z) - Dataset Condensation Driven Machine Unlearning [0.0]
Current trend in data regulation requirements and privacy-preserving machine learning has emphasized the importance of machine unlearning.
We propose new dataset condensation techniques and an innovative unlearning scheme that strikes a balance between machine unlearning privacy, utility, and efficiency.
We present a novel and effective approach to instrumenting machine unlearning and propose its application in defending against membership inference and model inversion attacks.
arXiv Detail & Related papers (2024-01-31T21:48:25Z) - Robust Machine Learning by Transforming and Augmenting Imperfect
Training Data [6.928276018602774]
This thesis explores several data sensitivities of modern machine learning.
We first discuss how to prevent ML from codifying prior human discrimination measured in the training data.
We then discuss the problem of learning from data containing spurious features, which provide predictive fidelity during training but are unreliable upon deployment.
arXiv Detail & Related papers (2023-12-19T20:49:28Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Segue: Side-information Guided Generative Unlearnable Examples for
Facial Privacy Protection in Real World [64.4289385463226]
We propose Segue: Side-information guided generative unlearnable examples.
To improve transferability, we introduce side information such as true labels and pseudo labels.
It can resist JPEG compression, adversarial training, and some standard data augmentations.
arXiv Detail & Related papers (2023-10-24T06:22:37Z) - Diffusion-Model-Assisted Supervised Learning of Generative Models for
Density Estimation [10.793646707711442]
We present a framework for training generative models for density estimation.
We use the score-based diffusion model to generate labeled data.
Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in the supervised manner.
arXiv Detail & Related papers (2023-10-22T23:56:19Z) - Generative Adversarial Networks Unlearning [13.342749941357152]
Machine unlearning has emerged as a solution to erase training data from trained machine learning models.
Research on Generative Adversarial Networks (GANs) is limited due to their unique architecture, including a generator and a discriminator.
We propose a cascaded unlearning approach for both item and class unlearning within GAN models, in which the unlearning and learning processes run in a cascaded manner.
arXiv Detail & Related papers (2023-08-19T02:21:21Z) - Class-wise Federated Unlearning: Harnessing Active Forgetting with Teacher-Student Memory Generation [11.638683787598817]
We propose a neuro-inspired federated unlearning framework based on active forgetting.
Our framework distinguishes itself from existing methods by utilizing new memories to overwrite old ones.
Our method achieves satisfactory unlearning completeness against backdoor attacks.
arXiv Detail & Related papers (2023-07-07T03:07:26Z) - Learning to Jump: Thinning and Thickening Latent Counts for Generative
Modeling [69.60713300418467]
Learning to jump is a general recipe for generative modeling of various types of data.
We demonstrate when learning to jump is expected to perform comparably to learning to denoise, and when it is expected to perform better.
arXiv Detail & Related papers (2023-05-28T05:38:28Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.