Related papers: Order of Magnitude Speedups for LLM Membership Inference

Order of Magnitude Speedups for LLM Membership Inference

URL: http://arxiv.org/abs/2409.14513v2
Date: Tue, 24 Sep 2024 17:48:58 GMT
Title: Order of Magnitude Speedups for LLM Membership Inference
Authors: Rongting Zhang, Martin Bertran, Aaron Roth,
Abstract summary: Large Language Models (LLMs) have the promise to revolutionize computing broadly, but their complexity and extensive training data also expose privacy vulnerabilities. One of the simplest privacy risks associated with LLMs is their susceptibility to membership inference attacks (MIAs) We propose a low-cost MIA that leverages an ensemble of small quantile regression models to determine if a document belongs to the model's training set or not.
Score: 5.124111136127848
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have the promise to revolutionize computing broadly, but their complexity and extensive training data also expose significant privacy vulnerabilities. One of the simplest privacy risks associated with LLMs is their susceptibility to membership inference attacks (MIAs), wherein an adversary aims to determine whether a specific data point was part of the model's training set. Although this is a known risk, state of the art methodologies for MIAs rely on training multiple computationally costly shadow models, making risk evaluation prohibitive for large models. Here we adapt a recent line of work which uses quantile regression to mount membership inference attacks; we extend this work by proposing a low-cost MIA that leverages an ensemble of small quantile regression models to determine if a document belongs to the model's training set or not. We demonstrate the effectiveness of this approach on fine-tuned LLMs of varying families (OPT, Pythia, Llama) and across multiple datasets. Across all scenarios we obtain comparable or improved accuracy compared to state of the art shadow model approaches, with as little as 6% of their computation budget. We demonstrate increased effectiveness across multi-epoch trained target models, and architecture miss-specification robustness, that is, we can mount an effective attack against a model using a different tokenizer and architecture, without requiring knowledge on the target model.

Related papers

SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models [51.74498855100541]
Large language models (LLMs) have shown strong reasoning capabilities when fine-tuned with reinforcement learning (RL)<n>We propose textbfSPaRFT, a self-paced learning framework that enables efficient learning based on the capability of the model being trained.
arXiv Detail & Related papers (2025-08-07T03:50:48Z)
Evaluating Query Efficiency and Accuracy of Transfer Learning-based Model Extraction Attack in Federated Learning [4.275908952997288]
Federated Learning (FL) is a collaborative learning framework designed to protect client data.<n>Despite FL's privacy-preserving goals, its distributed nature makes it particularly susceptible to model extraction attacks.<n>This paper examines the vulnerability of FL-based victim models to two types of model extraction attacks.
arXiv Detail & Related papers (2025-05-25T22:40:10Z)
A hierarchical approach for assessing the vulnerability of tree-based classification models to membership inference attack [0.552480439325792]
Machine learning models can inadvertently expose confidential properties of their training data, making them vulnerable to membership inference attacks (MIA) This article presents two new complementary approaches for efficiently identifying vulnerable tree-based models.
arXiv Detail & Related papers (2025-02-13T15:16:53Z)
EM-MIAs: Enhancing Membership Inference Attacks in Large Language Models through Ensemble Modeling [2.494935495983421]
This paper proposes a novel ensemble attack method that integrates several existing MIAs techniques into an XGBoost-based model to enhance overall attack performance (EM-MIAs) Experimental results demonstrate that the ensemble model significantly improves both AUC-ROC and accuracy compared to individual attack methods across various large language models and datasets.
arXiv Detail & Related papers (2024-12-23T03:47:54Z)
Free Record-Level Privacy Risk Evaluation Through Artifact-Based Methods [6.902279764206365]
Membership inference attacks (MIAs) are widely used to assess privacy risks in machine learning models. State-of-the-art methods require training hundreds of shadow models with the same architecture as the target model. We propose a novel approach for identifying the training samples most vulnerable to membership inference attacks by analyzing artifacts naturally available during the training process.
arXiv Detail & Related papers (2024-11-08T18:04:41Z)
Pandora's White-Box: Precise Training Data Detection and Extraction in Large Language Models [4.081098869497239]
We develop state-of-the-art privacy attacks against Large Language Models (LLMs) New membership inference attacks (MIAs) against pretrained LLMs perform hundreds of times better than baseline attacks. In fine-tuning, we find that a simple attack based on the ratio of the loss between the base and fine-tuned models is able to achieve near-perfect MIA performance.
arXiv Detail & Related papers (2024-02-26T20:41:50Z)
Do Membership Inference Attacks Work on Large Language Models? [141.2019867466968]
Membership inference attacks (MIAs) attempt to predict whether a particular datapoint is a member of a target model's training data. We perform a large-scale evaluation of MIAs over a suite of language models trained on the Pile, ranging from 160M to 12B parameters. We find that MIAs barely outperform random guessing for most settings across varying LLM sizes and domains.
arXiv Detail & Related papers (2024-02-12T17:52:05Z)
MIA-BAD: An Approach for Enhancing Membership Inference Attack and its Mitigation with Federated Learning [6.510488168434277]
The membership inference attack (MIA) is a popular paradigm for compromising the privacy of a machine learning (ML) model. We propose an enhanced Membership Inference Attack with the Batch-wise generated Attack dataset (MIA-BAD) We show how training an ML model through FL, has some distinct advantages and investigate how the threat introduced with the proposed MIA-BAD approach can be mitigated with FL approaches.
arXiv Detail & Related papers (2023-11-28T06:51:26Z)
Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A Pilot Study [17.421886085918608]
Membership inference attacks (MIAs) aim to infer whether a data point has been used to train a machine learning model. These attacks can be employed to identify potential privacy vulnerabilities and detect unauthorized use of personal data. This paper takes a first step towards developing practical MIAs against large-scale multi-modal models.
arXiv Detail & Related papers (2023-09-29T19:38:40Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc. We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing. Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z)
Adversarial Learning with Cost-Sensitive Classes [7.6596177815175475]
It is necessary to improve the performance of some special classes or to particularly protect them from attacks in adversarial learning. This paper proposes a framework combining cost-sensitive classification and adversarial learning together to train a model that can distinguish between protected and unprotected classes.
arXiv Detail & Related papers (2021-01-29T03:15:40Z)
Knowledge-Enriched Distributional Model Inversion Attacks [49.43828150561947]
Model inversion (MI) attacks are aimed at reconstructing training data from model parameters. We present a novel inversion-specific GAN that can better distill knowledge useful for performing attacks on private models from public data. Our experiments show that the combination of these techniques can significantly boost the success rate of the state-of-the-art MI attacks by 150%.
arXiv Detail & Related papers (2020-10-08T16:20:48Z)
Privacy Analysis of Deep Learning in the Wild: Membership Inference Attacks against Transfer Learning [27.494206948563885]
We present the first systematic evaluation of membership inference attacks against transfer learning models. Experiments on four real-world image datasets show that membership inference can achieve effective performance. Our results shed light on the severity of membership risks stemming from machine learning models in practice.
arXiv Detail & Related papers (2020-09-10T14:14:22Z)
How Does Data Augmentation Affect Privacy in Machine Learning? [94.52721115660626]
We propose new MI attacks to utilize the information of augmented data. We establish the optimal membership inference when the model is trained with augmented data.
arXiv Detail & Related papers (2020-07-21T02:21:10Z)
Learning Diverse Representations for Fast Adaptation to Distribution Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task. We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.