Related papers: Machine Unlearning for Document Classification

Machine Unlearning for Document Classification

URL: http://arxiv.org/abs/2404.19031v1
Date: Mon, 29 Apr 2024 18:16:13 GMT
Title: Machine Unlearning for Document Classification
Authors: Lei Kang, Mohamed Ali Souibgui, Fei Yang, Lluis Gomez, Ernest Valveny, Dimosthenis Karatzas,
Abstract summary: A novel approach, known as machine unlearning, has emerged to make AI models forget about a particular class of data. This work represents a pioneering step towards the development of machine unlearning methods aimed at addressing privacy concerns in document analysis applications.
Score: 14.71726430657162
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Document understanding models have recently demonstrated remarkable performance by leveraging extensive collections of user documents. However, since documents often contain large amounts of personal data, their usage can pose a threat to user privacy and weaken the bonds of trust between humans and AI services. In response to these concerns, legislation advocating ``the right to be forgotten" has recently been proposed, allowing users to request the removal of private information from computer systems and neural network models. A novel approach, known as machine unlearning, has emerged to make AI models forget about a particular class of data. In our research, we explore machine unlearning for document classification problems, representing, to the best of our knowledge, the first investigation into this area. Specifically, we consider a realistic scenario where a remote server houses a well-trained model and possesses only a small portion of training data. This setup is designed for efficient forgetting manipulation. This work represents a pioneering step towards the development of machine unlearning methods aimed at addressing privacy concerns in document analysis applications. Our code is publicly available at \url{https://github.com/leitro/MachineUnlearning-DocClassification}.

Related papers

Preserving Privacy Without Compromising Accuracy: Machine Unlearning for Handwritten Text Recognition [12.228611784356412]
Handwritten Text Recognition (HTR) is essential for document analysis and digitization. Legislation like the right to be forgotten'' underscores the necessity for methods that can expunge sensitive information from trained models. We introduce a novel two-stage unlearning strategy for a multi-head transformer-based HTR model, integrating pruning and random labeling.
arXiv Detail & Related papers (2025-04-11T15:21:12Z)
A Review on Machine Unlearning [3.1168315477643245]
This paper provides an in-depth review of the security and privacy concerns in machine learning models. First, we present how machine learning can use users' private data in daily life and the role that plays in this problem. Then, we introduce the concept of machine unlearning by describing the security threats in machine learning models.
arXiv Detail & Related papers (2024-11-18T06:18:13Z)
Verification of Machine Unlearning is Fragile [48.71651033308842]
We introduce two novel adversarial unlearning processes capable of circumventing both types of verification strategies. This study highlights the vulnerabilities and limitations in machine unlearning verification, paving the way for further research into the safety of machine unlearning.
arXiv Detail & Related papers (2024-08-01T21:37:10Z)
Federated Face Forgery Detection Learning with Personalized Representation [63.90408023506508]
Deep generator technology can produce high-quality fake videos that are indistinguishable, posing a serious social threat. Traditional forgery detection methods directly centralized training on data. The paper proposes a novel federated face forgery detection learning with personalized representation.
arXiv Detail & Related papers (2024-06-17T02:20:30Z)
The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements. LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information. Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z)
Privacy Side Channels in Machine Learning Systems [87.53240071195168]
We introduce privacy side channels: attacks that exploit system-level components to extract private information. For example, we show that deduplicating training data before applying differentially-private training creates a side-channel that completely invalidates any provable privacy guarantees. We further show that systems which block language models from regenerating training data can be exploited to exfiltrate private keys contained in the training set.
arXiv Detail & Related papers (2023-09-11T16:49:05Z)
A Survey of Machine Unlearning [56.017968863854186]
Recent regulations now require that, on request, private information about a user must be removed from computer systems. ML models often remember' the old data. Recent works on machine unlearning have not been able to completely solve the problem.
arXiv Detail & Related papers (2022-09-06T08:51:53Z)
Lightweight machine unlearning in neural network [2.406359246841227]
"Right to be forgotten" was introduced in a timely manner, stipulating that individuals have the right to withdraw their consent based on their consent. To solve this problem, machine unlearning is proposed, which allows the model to erase all memory of private information. Our method is 15 times faster than retraining.
arXiv Detail & Related papers (2021-11-10T04:48:31Z)
TIPRDC: Task-Independent Privacy-Respecting Data Crowdsourcing Framework for Deep Learning with Anonymized Intermediate Representations [49.20701800683092]
We present TIPRDC, a task-independent privacy-respecting data crowdsourcing framework with anonymized intermediate representation. The goal of this framework is to learn a feature extractor that can hide the privacy information from the intermediate representations; while maximally retaining the original information embedded in the raw data for the data collector to accomplish unknown learning tasks.
arXiv Detail & Related papers (2020-05-23T06:21:26Z)
An Overview of Privacy in Machine Learning [2.8935588665357077]
This document provides background information on relevant concepts around machine learning and privacy. We discuss possible adversarial models and settings, cover a wide range of attacks that relate to private and/or sensitive information leakage.
arXiv Detail & Related papers (2020-05-18T13:05:17Z)
When Machine Unlearning Jeopardizes Privacy [25.167214892258567]
We investigate the unintended information leakage caused by machine unlearning. We propose a novel membership inference attack that achieves strong performance. Our results can help improve privacy protection in practical implementations of machine unlearning.
arXiv Detail & Related papers (2020-05-05T14:11:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.