Related papers: Improved Membership Inference Attacks Against Language Classification Models

Improved Membership Inference Attacks Against Language Classification Models

URL: http://arxiv.org/abs/2310.07219v2
Date: Thu, 18 Jul 2024 12:55:29 GMT
Title: Improved Membership Inference Attacks Against Language Classification Models
Authors: Shlomit Shachor, Natalia Razinkov, Abigail Goldsteen,
Abstract summary: We present a novel framework for running membership inference attacks against classification models. We show that this approach achieves higher accuracy than either a single attack model or an attack model per class label.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Artificial intelligence systems are prevalent in everyday life, with use cases in retail, manufacturing, health, and many other fields. With the rise in AI adoption, associated risks have been identified, including privacy risks to the people whose data was used to train models. Assessing the privacy risks of machine learning models is crucial to enabling knowledgeable decisions on whether to use, deploy, or share a model. A common approach to privacy risk assessment is to run one or more known attacks against the model and measure their success rate. We present a novel framework for running membership inference attacks against classification models. Our framework takes advantage of the ensemble method, generating many specialized attack models for different subsets of the data. We show that this approach achieves higher accuracy than either a single attack model or an attack model per class label, both on classical and language classification tasks.

Related papers

Breaking the Gold Standard: Extracting Forgotten Data under Exact Unlearning in Large Language Models [26.5039481643457]
We introduce a novel data extraction attack that compromises even exact unlearning.<n>We demonstrate our attack's effectiveness on a simulated medical diagnosis dataset.
arXiv Detail & Related papers (2025-05-30T09:09:33Z)
Adversarial Training for Defense Against Label Poisoning Attacks [53.893792844055106]
Label poisoning attacks pose significant risks to machine learning models. We propose a novel adversarial training defense strategy based on support vector machines (SVMs) to counter these threats. Our approach accommodates various model architectures and employs a projected gradient descent algorithm with kernel SVMs for adversarial training.
arXiv Detail & Related papers (2025-02-24T13:03:19Z)
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack. When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
SoK: Reducing the Vulnerability of Fine-tuned Language Models to Membership Inference Attacks [1.03590082373586]
We provide the first systematic review of the vulnerability of large language models to membership inference attacks. We find that some training methods provide significantly reduced privacy risk, with the combination of differential privacy and low-rank adaptors achieving the best privacy protection against these attacks.
arXiv Detail & Related papers (2024-03-13T12:46:51Z)
SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models [74.58014281829946]
We analyze the effectiveness of several representative attacks/defenses, including model stealing attacks, membership inference attacks, and backdoor detection on public models. Our evaluation empirically shows the performance of these attacks/defenses can vary significantly on public models compared to self-trained models.
arXiv Detail & Related papers (2023-10-19T11:49:22Z)
Leveraging Adversarial Examples to Quantify Membership Information Leakage [30.55736840515317]
We develop a novel approach to address the problem of membership inference in pattern recognition models. We argue that this quantity reflects the likelihood of belonging to the training data. Our method performs comparable or even outperforms state-of-the-art strategies.
arXiv Detail & Related papers (2022-03-17T19:09:38Z)
Enhanced Membership Inference Attacks against Machine Learning Models [9.26208227402571]
Membership inference attacks are used to quantify the private information that a model leaks about the individual data points in its training set. We derive new attack algorithms that can achieve a high AUC score while also highlighting the different factors that affect their performance. Our algorithms capture a very precise approximation of privacy loss in models, and can be used as a tool to perform an accurate and informed estimation of privacy risk in machine learning models.
arXiv Detail & Related papers (2021-11-18T13:31:22Z)
Delving into Data: Effectively Substitute Training for Black-box Attack [84.85798059317963]
We propose a novel perspective substitute training that focuses on designing the distribution of data used in the knowledge stealing process. The combination of these two modules can further boost the consistency of the substitute model and target model, which greatly improves the effectiveness of adversarial attack.
arXiv Detail & Related papers (2021-04-26T07:26:29Z)
ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc. We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing. Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z)
Knowledge-Enriched Distributional Model Inversion Attacks [49.43828150561947]
Model inversion (MI) attacks are aimed at reconstructing training data from model parameters. We present a novel inversion-specific GAN that can better distill knowledge useful for performing attacks on private models from public data. Our experiments show that the combination of these techniques can significantly boost the success rate of the state-of-the-art MI attacks by 150%.
arXiv Detail & Related papers (2020-10-08T16:20:48Z)
Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples. We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z)
Systematic Evaluation of Privacy Risks of Machine Learning Models [41.017707772150835]
We show that prior work on membership inference attacks may severely underestimate the privacy risks. We first propose to benchmark membership inference privacy risks by improving existing non-neural network based inference attacks. We then introduce a new approach for fine-grained privacy analysis by formulating and deriving a new metric called the privacy risk score.
arXiv Detail & Related papers (2020-03-24T00:53:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.