Machine Unlearning for Document Classification
- URL: http://arxiv.org/abs/2404.19031v1
- Date: Mon, 29 Apr 2024 18:16:13 GMT
- Title: Machine Unlearning for Document Classification
- Authors: Lei Kang, Mohamed Ali Souibgui, Fei Yang, Lluis Gomez, Ernest Valveny, Dimosthenis Karatzas,
- Abstract summary: A novel approach, known as machine unlearning, has emerged to make AI models forget about a particular class of data.
This work represents a pioneering step towards the development of machine unlearning methods aimed at addressing privacy concerns in document analysis applications.
- Score: 14.71726430657162
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Document understanding models have recently demonstrated remarkable performance by leveraging extensive collections of user documents. However, since documents often contain large amounts of personal data, their usage can pose a threat to user privacy and weaken the bonds of trust between humans and AI services. In response to these concerns, legislation advocating ``the right to be forgotten" has recently been proposed, allowing users to request the removal of private information from computer systems and neural network models. A novel approach, known as machine unlearning, has emerged to make AI models forget about a particular class of data. In our research, we explore machine unlearning for document classification problems, representing, to the best of our knowledge, the first investigation into this area. Specifically, we consider a realistic scenario where a remote server houses a well-trained model and possesses only a small portion of training data. This setup is designed for efficient forgetting manipulation. This work represents a pioneering step towards the development of machine unlearning methods aimed at addressing privacy concerns in document analysis applications. Our code is publicly available at \url{https://github.com/leitro/MachineUnlearning-DocClassification}.
Related papers
- MUSE: Machine Unlearning Six-Way Evaluation for Language Models [109.76505405962783]
Language models (LMs) are trained on vast amounts of text data, which may include private and copyrighted content.
We propose MUSE, a comprehensive machine unlearning evaluation benchmark.
We benchmark how effectively eight popular unlearning algorithms can unlearn Harry Potter books and news articles.
arXiv Detail & Related papers (2024-07-08T23:47:29Z) - Federated Face Forgery Detection Learning with Personalized Representation [63.90408023506508]
Deep generator technology can produce high-quality fake videos that are indistinguishable, posing a serious social threat.
Traditional forgery detection methods directly centralized training on data.
The paper proposes a novel federated face forgery detection learning with personalized representation.
arXiv Detail & Related papers (2024-06-17T02:20:30Z) - Gone but Not Forgotten: Improved Benchmarks for Machine Unlearning [0.0]
We describe and propose alternative evaluation methods for machine unlearning algorithms.
We show the utility of our alternative evaluations via a series of experiments of state-of-the-art unlearning algorithms on different computer vision datasets.
arXiv Detail & Related papers (2024-05-29T15:53:23Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Privacy Side Channels in Machine Learning Systems [87.53240071195168]
We introduce privacy side channels: attacks that exploit system-level components to extract private information.
For example, we show that deduplicating training data before applying differentially-private training creates a side-channel that completely invalidates any provable privacy guarantees.
We further show that systems which block language models from regenerating training data can be exploited to exfiltrate private keys contained in the training set.
arXiv Detail & Related papers (2023-09-11T16:49:05Z) - A Survey of Machine Unlearning [43.272767023563254]
Recent regulations require that private information about a user can be removed from computer systems in general and from ML models in particular upon request.
This phenomenon calls for a new paradigm, namely machine unlearning, to make ML models forget about particular data.
We seek to provide a thorough investigation of machine unlearning in its definitions, scenarios, mechanisms, and applications.
arXiv Detail & Related papers (2022-09-06T08:51:53Z) - Lightweight machine unlearning in neural network [2.406359246841227]
"Right to be forgotten" was introduced in a timely manner, stipulating that individuals have the right to withdraw their consent based on their consent.
To solve this problem, machine unlearning is proposed, which allows the model to erase all memory of private information.
Our method is 15 times faster than retraining.
arXiv Detail & Related papers (2021-11-10T04:48:31Z) - TIPRDC: Task-Independent Privacy-Respecting Data Crowdsourcing Framework
for Deep Learning with Anonymized Intermediate Representations [49.20701800683092]
We present TIPRDC, a task-independent privacy-respecting data crowdsourcing framework with anonymized intermediate representation.
The goal of this framework is to learn a feature extractor that can hide the privacy information from the intermediate representations; while maximally retaining the original information embedded in the raw data for the data collector to accomplish unknown learning tasks.
arXiv Detail & Related papers (2020-05-23T06:21:26Z) - An Overview of Privacy in Machine Learning [2.8935588665357077]
This document provides background information on relevant concepts around machine learning and privacy.
We discuss possible adversarial models and settings, cover a wide range of attacks that relate to private and/or sensitive information leakage.
arXiv Detail & Related papers (2020-05-18T13:05:17Z) - When Machine Unlearning Jeopardizes Privacy [25.167214892258567]
We investigate the unintended information leakage caused by machine unlearning.
We propose a novel membership inference attack that achieves strong performance.
Our results can help improve privacy protection in practical implementations of machine unlearning.
arXiv Detail & Related papers (2020-05-05T14:11:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.