Win-k: Improved Membership Inference Attacks on Small Language Models
- URL: http://arxiv.org/abs/2508.01268v1
- Date: Sat, 02 Aug 2025 08:50:42 GMT
- Title: Win-k: Improved Membership Inference Attacks on Small Language Models
- Authors: Roya Arkhmammadova, Hosein Madadi Tamar, M. Emre Gursoy,
- Abstract summary: We study membership inference attacks (MIAs) on small language models (SLMs)<n>We propose a new MIA called win-k, which builds on top of a state-of-the-art attack (min-k)
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Small language models (SLMs) are increasingly valued for their efficiency and deployability in resource-constrained environments, making them useful for on-device, privacy-sensitive, and edge computing applications. On the other hand, membership inference attacks (MIAs), which aim to determine whether a given sample was used in a model's training, are an important threat with serious privacy and intellectual property implications. In this paper, we study MIAs on SLMs. Although MIAs were shown to be effective on large language models (LLMs), they are relatively less studied on emerging SLMs, and furthermore, their effectiveness decreases as models get smaller. Motivated by this finding, we propose a new MIA called win-k, which builds on top of a state-of-the-art attack (min-k). We experimentally evaluate win-k by comparing it with five existing MIAs using three datasets and eight SLMs. Results show that win-k outperforms existing MIAs in terms of AUROC, TPR @ 1% FPR, and FPR @ 99% TPR metrics, especially on smaller models.
Related papers
- AttenMIA: LLM Membership Inference Attack through Attention Signals [8.170623979629953]
We introduce AttenMIA, a new MIA framework that exploits self-attention patterns inside the transformer model to infer membership.<n>We show that attention-based features consistently outperform baselines, particularly under the important low-false-positive metric.<n>We also show that using AttenMIA to replace other membership inference attacks in a data extraction framework results in training data extraction attacks that outperform the state of the art.
arXiv Detail & Related papers (2026-01-26T03:45:56Z) - Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models [38.27329422174473]
State-of-the-art membership inference attacks (MIAs) typically require training many reference models, making it difficult to scale these attacks to large pre-trained language models (LLMs)<n>We address this question by scaling LiRA - one of the strongest MIAs - to GPT-2 architectures ranging from 10M to 1B parameters, training reference models on over 20B tokens from the C4 dataset.
arXiv Detail & Related papers (2025-05-24T16:23:43Z) - Membership Inference Attacks on Large-Scale Models: A Survey [4.717839478553265]
Membership Inference Attacks (MIAs) are techniques used to determine whether a particular data point was part of a model's training set.<n>MIAs are a key metric for assessing the privacy vulnerabilities of machine learning models.<n>Despite extensive studies on MIAs in classic models, there remains a lack of systematic surveys addressing their effectiveness and limitations.
arXiv Detail & Related papers (2025-03-25T04:11:47Z) - Benchmarking Large and Small MLLMs [71.78055760441256]
Large multimodal language models (MLLMs) have achieved remarkable advancements in understanding and generating multimodal content.<n>However, their deployment faces significant challenges, including slow inference, high computational cost, and impracticality for on-device applications.<n>Small MLLMs, exemplified by the LLava-series models and Phi-3-Vision, offer promising alternatives with faster inference, reduced deployment costs, and the ability to handle domain-specific scenarios.
arXiv Detail & Related papers (2025-01-04T07:44:49Z) - EM-MIAs: Enhancing Membership Inference Attacks in Large Language Models through Ensemble Modeling [2.494935495983421]
This paper proposes a novel ensemble attack method that integrates several existing MIAs techniques into an XGBoost-based model to enhance overall attack performance (EM-MIAs)<n> Experimental results demonstrate that the ensemble model significantly improves both AUC-ROC and accuracy compared to individual attack methods across various large language models and datasets.
arXiv Detail & Related papers (2024-12-23T03:47:54Z) - Detecting Training Data of Large Language Models via Expectation Maximization [62.28028046993391]
We introduce EM-MIA, a novel membership inference method that iteratively refines membership scores and prefix scores via an expectation-maximization algorithm.<n> EM-MIA achieves state-of-the-art results on WikiMIA.
arXiv Detail & Related papers (2024-10-10T03:31:16Z) - LLAVADI: What Matters For Multimodal Large Language Models Distillation [77.73964744238519]
In this work, we do not propose a new efficient model structure or train small-scale MLLMs from scratch.
Our studies involve training strategies, model choices, and distillation algorithms in the knowledge distillation process.
By evaluating different benchmarks and proper strategy, even a 2.7B small-scale model can perform on par with larger models with 7B or 13B parameters.
arXiv Detail & Related papers (2024-07-28T06:10:47Z) - AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models [95.09157454599605]
Large Language Models (LLMs) are becoming increasingly powerful, but they still exhibit significant but subtle weaknesses.<n>Traditional benchmarking approaches cannot thoroughly pinpoint specific model deficiencies.<n>We introduce a unified framework, AutoDetect, to automatically expose weaknesses in LLMs across various tasks.
arXiv Detail & Related papers (2024-06-24T15:16:45Z) - Do Membership Inference Attacks Work on Large Language Models? [141.2019867466968]
Membership inference attacks (MIAs) attempt to predict whether a particular datapoint is a member of a target model's training data.
We perform a large-scale evaluation of MIAs over a suite of language models trained on the Pile, ranging from 160M to 12B parameters.
We find that MIAs barely outperform random guessing for most settings across varying LLM sizes and domains.
arXiv Detail & Related papers (2024-02-12T17:52:05Z) - FedMIA: An Effective Membership Inference Attack Exploiting "All for One" Principle in Federated Learning [17.141646895576145]
Federated Learning (FL) is a promising approach for training machine learning models on decentralized data.<n>Membership Inference Attacks (MIAs) aim to determine whether a specific data point belongs to a target client's training set.<n>We introduce a three-step Membership Inference Attack (MIA) method, called FedMIA, which follows the "all for one"--leveraging updates from all clients across multiple communication rounds to enhance MIA effectiveness.
arXiv Detail & Related papers (2024-02-09T09:58:35Z) - Embedding Attack Project (Work Report) [1.1406834504148182]
This report summarizes all the MIA experiments (Membership Inference Attacks) of the Embedding Attack Project.
Current results cover the evaluation of two main MIA strategies on 6 AI models ranging from Computer Vision to Language Modelling.
There are two ongoing experiments on MIA defense and neighborhood-comparison embedding attacks.
arXiv Detail & Related papers (2024-01-24T23:35:29Z) - Learning-Based Difficulty Calibration for Enhanced Membership Inference Attacks [3.470379197911889]
Membership Inference Attacks (MIA) allows adversaries to determine whether a specific data point was part of a model's training dataset.
We present a novel approach to MIA that is aimed at significantly improving TPR at low False Positive Rate (FPR)
Experiment results show that LDC-MIA can improve TPR at low FPR by up to 4x compared to the other difficulty calibration based MIAs.
arXiv Detail & Related papers (2024-01-10T04:58:17Z) - RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target.
RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead.
Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.