Related papers: Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning

Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning

URL: http://arxiv.org/abs/2506.09923v1
Date: Wed, 11 Jun 2025 16:43:36 GMT
Title: Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning
Authors: Liou Tang, James Joshi, Ashish Kundu,
Abstract summary: We propose a novel privacy attack that infers whether a data sample has been unlearned, following a strict threat model.<n>Our proposed attack, while requiring less access to the target model compared to previous attacks, can achieve relatively high precision on the membership status of the unlearned samples.
Score: 0.6117371161379209
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Machine Unlearning (MU) aims to update Machine Learning (ML) models following requests to remove training samples and their influences on a trained model efficiently without retraining the original ML model from scratch. While MU itself has been employed to provide privacy protection and regulatory compliance, it can also increase the attack surface of the model. Existing privacy inference attacks towards MU that aim to infer properties of the unlearned set rely on the weaker threat model that assumes the attacker has access to both the unlearned model and the original model, limiting their feasibility toward real-life scenarios. We propose a novel privacy attack, A Posteriori Label-Only Membership Inference Attack towards MU, Apollo, that infers whether a data sample has been unlearned, following a strict threat model where an adversary has access to the label-output of the unlearned model only. We demonstrate that our proposed attack, while requiring less access to the target model compared to previous attacks, can achieve relatively high precision on the membership status of the unlearned samples.

Related papers

No Query, No Access [50.18709429731724]
We introduce the textbfVictim Data-based Adrial Attack (VDBA), which operates using only victim texts.<n>To prevent access to the victim model, we create a shadow dataset with publicly available pre-trained models and clustering methods.<n>Experiments on the Emotion and SST5 datasets show that VDBA outperforms state-of-the-art methods, achieving an ASR improvement of 52.08%.
arXiv Detail & Related papers (2025-05-12T06:19:59Z)
A Method to Facilitate Membership Inference Attacks in Deep Learning Models [5.724311218570013]
We demonstrate a new form of membership inference attack that is strictly more powerful than prior art. Our attack empowers the adversary to reliably de-identify all the training samples. We show that the models can effectively disguise the amplified membership leakage under common membership privacy auditing.
arXiv Detail & Related papers (2024-07-02T03:33:42Z)
SnatchML: Hijacking ML models without Training Access [5.005171792255858]
We consider a stronger threat model for an inference hijacking-time attack, where the adversary has no access to the training phase of the victim model.<n>We propose SnatchML, a new training-free model hijacking attack.<n>Our results on models deployed on AWS Sagemaker showed that SnatchML can deliver high accuracy on hijacking tasks.
arXiv Detail & Related papers (2024-06-03T18:04:37Z)
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack. When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
Membership Inference Attacks on Diffusion Models via Quantile Regression [30.30033625685376]
We demonstrate a privacy vulnerability of diffusion models through amembership inference (MI) attack. Our proposed MI attack learns quantile regression models that predict (a quantile of) the distribution of reconstruction loss on examples not used in training. We show that our attack outperforms the prior state-of-the-art attack while being substantially less computationally expensive.
arXiv Detail & Related papers (2023-12-08T16:21:24Z)
Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access. We investigate factors influencing the success of model extraction attacks. Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z)
Unstoppable Attack: Label-Only Model Inversion via Conditional Diffusion Model [14.834360664780709]
Model attacks (MIAs) aim to recover private data from inaccessible training sets of deep learning models. This paper develops a novel MIA method, leveraging a conditional diffusion model (CDM) to recover representative samples under the target label. Experimental results show that this method can generate similar and accurate samples to the target label, outperforming generators of previous approaches.
arXiv Detail & Related papers (2023-07-17T12:14:24Z)
CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated Learning [77.27443885999404]
Federated Learning (FL) is a setting for training machine learning models in distributed environments. We propose a novel method, CANIFE, that uses carefully crafted samples by a strong adversary to evaluate the empirical privacy of a training round.
arXiv Detail & Related papers (2022-10-06T13:30:16Z)
Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples. We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z)
Privacy Analysis of Deep Learning in the Wild: Membership Inference Attacks against Transfer Learning [27.494206948563885]
We present the first systematic evaluation of membership inference attacks against transfer learning models. Experiments on four real-world image datasets show that membership inference can achieve effective performance. Our results shed light on the severity of membership risks stemming from machine learning models in practice.
arXiv Detail & Related papers (2020-09-10T14:14:22Z)
Sampling Attacks: Amplification of Membership Inference Attacks by Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model. We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance. For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.