EM-MIAs: Enhancing Membership Inference Attacks in Large Language Models through Ensemble Modeling
- URL: http://arxiv.org/abs/2412.17249v1
- Date: Mon, 23 Dec 2024 03:47:54 GMT
- Title: EM-MIAs: Enhancing Membership Inference Attacks in Large Language Models through Ensemble Modeling
- Authors: Zichen Song, Sitan Huang, Zhongfeng Kang,
- Abstract summary: This paper proposes a novel ensemble attack method that integrates several existing MIAs techniques into an XGBoost-based model to enhance overall attack performance (EM-MIAs)
Experimental results demonstrate that the ensemble model significantly improves both AUC-ROC and accuracy compared to individual attack methods across various large language models and datasets.
- Score: 2.494935495983421
- License:
- Abstract: With the widespread application of large language models (LLM), concerns about the privacy leakage of model training data have increasingly become a focus. Membership Inference Attacks (MIAs) have emerged as a critical tool for evaluating the privacy risks associated with these models. Although existing attack methods, such as LOSS, Reference-based, min-k, and zlib, perform well in certain scenarios, their effectiveness on large pre-trained language models often approaches random guessing, particularly in the context of large-scale datasets and single-epoch training. To address this issue, this paper proposes a novel ensemble attack method that integrates several existing MIAs techniques (LOSS, Reference-based, min-k, zlib) into an XGBoost-based model to enhance overall attack performance (EM-MIAs). Experimental results demonstrate that the ensemble model significantly improves both AUC-ROC and accuracy compared to individual attack methods across various large language models and datasets. This indicates that by combining the strengths of different methods, we can more effectively identify members of the model's training data, thereby providing a more robust tool for evaluating the privacy risks of LLM. This study offers new directions for further research in the field of LLM privacy protection and underscores the necessity of developing more powerful privacy auditing methods.
Related papers
- Multi-Objective Optimization-Based Anonymization of Structured Data for Machine Learning [0.5452584641316627]
Our research identifies key limitations in existing optimization models for privacy preservation.
We propose a novel multi-objective optimization model that simultaneously minimizes information loss and maximizes protection against attacks.
arXiv Detail & Related papers (2025-01-02T01:52:36Z) - Order of Magnitude Speedups for LLM Membership Inference [5.124111136127848]
Large Language Models (LLMs) have the promise to revolutionize computing broadly, but their complexity and extensive training data also expose privacy vulnerabilities.
One of the simplest privacy risks associated with LLMs is their susceptibility to membership inference attacks (MIAs)
We propose a low-cost MIA that leverages an ensemble of small quantile regression models to determine if a document belongs to the model's training set or not.
arXiv Detail & Related papers (2024-09-22T16:18:14Z) - Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - Do Membership Inference Attacks Work on Large Language Models? [141.2019867466968]
Membership inference attacks (MIAs) attempt to predict whether a particular datapoint is a member of a target model's training data.
We perform a large-scale evaluation of MIAs over a suite of language models trained on the Pile, ranging from 160M to 12B parameters.
We find that MIAs barely outperform random guessing for most settings across varying LLM sizes and domains.
arXiv Detail & Related papers (2024-02-12T17:52:05Z) - Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration [32.15773300068426]
Membership Inference Attacks aim to infer whether a target data record has been utilized for model training.
We propose a Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA)
arXiv Detail & Related papers (2023-11-10T13:55:05Z) - Assessing Privacy Risks in Language Models: A Case Study on
Summarization Tasks [65.21536453075275]
We focus on the summarization task and investigate the membership inference (MI) attack.
We exploit text similarity and the model's resistance to document modifications as potential MI signals.
We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
arXiv Detail & Related papers (2023-10-20T05:44:39Z) - SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models [74.58014281829946]
We analyze the effectiveness of several representative attacks/defenses, including model stealing attacks, membership inference attacks, and backdoor detection on public models.
Our evaluation empirically shows the performance of these attacks/defenses can vary significantly on public models compared to self-trained models.
arXiv Detail & Related papers (2023-10-19T11:49:22Z) - Practical Membership Inference Attacks Against Large-Scale Multi-Modal
Models: A Pilot Study [17.421886085918608]
Membership inference attacks (MIAs) aim to infer whether a data point has been used to train a machine learning model.
These attacks can be employed to identify potential privacy vulnerabilities and detect unauthorized use of personal data.
This paper takes a first step towards developing practical MIAs against large-scale multi-modal models.
arXiv Detail & Related papers (2023-09-29T19:38:40Z) - ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine
Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc.
We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing.
Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z) - Knowledge-Enriched Distributional Model Inversion Attacks [49.43828150561947]
Model inversion (MI) attacks are aimed at reconstructing training data from model parameters.
We present a novel inversion-specific GAN that can better distill knowledge useful for performing attacks on private models from public data.
Our experiments show that the combination of these techniques can significantly boost the success rate of the state-of-the-art MI attacks by 150%.
arXiv Detail & Related papers (2020-10-08T16:20:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.