Machine Unlearning: Learning, Polluting, and Unlearning for Spam Email
- URL: http://arxiv.org/abs/2111.14609v1
- Date: Fri, 26 Nov 2021 12:13:11 GMT
- Title: Machine Unlearning: Learning, Polluting, and Unlearning for Spam Email
- Authors: Nishchal Parne, Kyathi Puppaala, Nithish Bhupathi and Ripon Patgiri
- Abstract summary: Several spam email detection methods exist, each of which employs a different algorithm to detect undesired spam emails.
Many attackers exploit the model by polluting the data, which are trained to the model in various ways.
Retraining is impractical in most cases as there is already a massive amount of data trained to the model in the past.
Unlearning is fast, easy to implement, easy to use, and effective.
- Score: 0.9176056742068814
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine unlearning for security is studied in this context. Several spam
email detection methods exist, each of which employs a different algorithm to
detect undesired spam emails. But these models are vulnerable to attacks. Many
attackers exploit the model by polluting the data, which are trained to the
model in various ways. So to act deftly in such situations model needs to
readily unlearn the polluted data without the need for retraining. Retraining
is impractical in most cases as there is already a massive amount of data
trained to the model in the past, which needs to be trained again just for
removing a small amount of polluted data, which is often significantly less
than 1%. This problem can be solved by developing unlearning frameworks for all
spam detection models. In this research, unlearning module is integrated into
spam detection models that are based on Naive Bayes, Decision trees, and Random
Forests algorithms. To assess the benefits of unlearning over retraining, three
spam detection models are polluted and exploited by taking attackers' positions
and proving models' vulnerability. Reduction in accuracy and true positive
rates are shown in each case showing the effect of pollution on models. Then
unlearning modules are integrated into the models, and polluted data is
unlearned; on testing the models after unlearning, restoration of performance
is seen. Also, unlearning and retraining times are compared with different
pollution data sizes on all models. On analyzing the findings, it can be
concluded that unlearning is considerably superior to retraining. Results show
that unlearning is fast, easy to implement, easy to use, and effective.
Related papers
- Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest.
Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z) - Corrective Machine Unlearning [22.342035149807923]
We formalize Corrective Machine Unlearning as the problem of mitigating the impact of data affected by unknown manipulations on a trained model.
We find most existing unlearning methods, including retraining-from-scratch without the deletion set, require most of the manipulated data to be identified for effective corrective unlearning.
One approach, Selective Synaptic Dampening, achieves limited success, unlearning adverse effects with just a small portion of the manipulated samples in our setting.
arXiv Detail & Related papers (2024-02-21T18:54:37Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - LARA: A Light and Anti-overfitting Retraining Approach for Unsupervised
Time Series Anomaly Detection [49.52429991848581]
We propose a Light and Anti-overfitting Retraining Approach (LARA) for deep variational auto-encoder based time series anomaly detection methods (VAEs)
This work aims to make three novel contributions: 1) the retraining process is formulated as a convex problem and can converge at a fast rate as well as prevent overfitting; 2) designing a ruminate block, which leverages the historical data without the need to store them; and 3) mathematically proving that when fine-tuning the latent vector and reconstructed data, the linear formations can achieve the least adjusting errors between the ground truths and the fine-tuned ones.
arXiv Detail & Related papers (2023-10-09T12:36:16Z) - AI Model Disgorgement: Methods and Choices [127.54319351058167]
We introduce a taxonomy of possible disgorgement methods that are applicable to modern machine learning systems.
We investigate the meaning of "removing the effects" of data in the trained model in a way that does not require retraining from scratch.
arXiv Detail & Related papers (2023-04-07T08:50:18Z) - Learning to Unlearn: Instance-wise Unlearning for Pre-trained
Classifiers [71.70205894168039]
We consider instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model.
We propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information.
arXiv Detail & Related papers (2023-01-27T07:53:50Z) - Deep Regression Unlearning [6.884272840652062]
We introduce deep regression unlearning methods that generalize well and are robust to privacy attacks.
We conduct regression unlearning experiments for computer vision, natural language processing and forecasting applications.
arXiv Detail & Related papers (2022-10-15T05:00:20Z) - Zero-Shot Machine Unlearning [6.884272840652062]
Modern privacy regulations grant citizens the right to be forgotten by products, services and companies.
No data related to the training process or training samples may be accessible for the unlearning purpose.
We propose two novel solutions for zero-shot machine unlearning based on (a) error minimizing-maximizing noise and (b) gated knowledge transfer.
arXiv Detail & Related papers (2022-01-14T19:16:09Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z) - Certifiable Machine Unlearning for Linear Models [1.484852576248587]
Machine unlearning is the task of updating machine learning (ML) models after a subset of the training data they were trained on is deleted.
We present an experimental study of the three state-of-the-art approximate unlearning methods for linear models.
arXiv Detail & Related papers (2021-06-29T05:05:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.