Embedding Attack Project (Work Report)
- URL: http://arxiv.org/abs/2401.13854v1
- Date: Wed, 24 Jan 2024 23:35:29 GMT
- Title: Embedding Attack Project (Work Report)
- Authors: Jiameng Pu and Zafar Takhirov
- Abstract summary: This report summarizes all the MIA experiments (Membership Inference Attacks) of the Embedding Attack Project.
Current results cover the evaluation of two main MIA strategies on 6 AI models ranging from Computer Vision to Language Modelling.
There are two ongoing experiments on MIA defense and neighborhood-comparison embedding attacks.
- Score: 1.1406834504148182
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This report summarizes all the MIA experiments (Membership Inference Attacks)
of the Embedding Attack Project, including threat models, experimental setup,
experimental results, findings and discussion. Current results cover the
evaluation of two main MIA strategies (loss-based and embedding-based MIAs) on
6 AI models ranging from Computer Vision to Language Modelling. There are two
ongoing experiments on MIA defense and neighborhood-comparison embedding
attacks. These are ongoing projects.
The current work on MIA and PIA can be summarized into six conclusions: (1)
Amount of overfitting is directly proportional to model's vulnerability; (2)
early embedding layers in the model are less susceptible to privacy leaks; (3)
Deeper model layers contain more membership information; (4) Models are more
vulnerable to MIA if both embeddings and corresponding training labels are
compromised; (5) it is possible to use pseudo-labels to increase the MIA
success; and (6) although MIA and PIA success rates are proportional, reducing
the MIA does not necessarily reduce the PIA.
Related papers
- SoK: Challenges in Tabular Membership Inference Attacks [10.848042721721491]
Membership Inference Attacks (MIAs) are a dominant approach for evaluating privacy in machine learning applications.<n>In this paper, we provide an extensive review and analysis of MIAs considering two main learning paradigms: centralized and federated learning.<n>We show that even attacks with limited attack performance can still successfully expose a large portion of single-outs.
arXiv Detail & Related papers (2026-01-22T11:30:11Z) - Win-k: Improved Membership Inference Attacks on Small Language Models [0.0]
We study membership inference attacks (MIAs) on small language models (SLMs)<n>We propose a new MIA called win-k, which builds on top of a state-of-the-art attack (min-k)
arXiv Detail & Related papers (2025-08-02T08:50:42Z) - Paper Summary Attack: Jailbreaking LLMs through LLM Safety Papers [61.57691030102618]
We propose a novel jailbreaking method, Paper Summary Attack (llmnamePSA)<n>It synthesizes content from either attack-focused or defense-focused LLM safety paper to construct an adversarial prompt template.<n>Experiments show significant vulnerabilities not only in base LLMs, but also in state-of-the-art reasoning model like Deepseek-R1.
arXiv Detail & Related papers (2025-07-17T18:33:50Z) - MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z) - Evaluating Query Efficiency and Accuracy of Transfer Learning-based Model Extraction Attack in Federated Learning [4.275908952997288]
Federated Learning (FL) is a collaborative learning framework designed to protect client data.<n>Despite FL's privacy-preserving goals, its distributed nature makes it particularly susceptible to model extraction attacks.<n>This paper examines the vulnerability of FL-based victim models to two types of model extraction attacks.
arXiv Detail & Related papers (2025-05-25T22:40:10Z) - Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models [38.27329422174473]
State-of-the-art membership inference attacks (MIAs) typically require training many reference models, making it difficult to scale these attacks to large pre-trained language models (LLMs)<n>We address this question by scaling LiRA - one of the strongest MIAs - to GPT-2 architectures ranging from 10M to 1B parameters, training reference models on over 20B tokens from the C4 dataset.
arXiv Detail & Related papers (2025-05-24T16:23:43Z) - Intrinsic Model Weaknesses: How Priming Attacks Unveil Vulnerabilities in Large Language Models [40.180771969531456]
Large language models (LLMs) have significantly influenced various industries but suffer from a critical flaw, the potential sensitivity of generating harmful content.
We developed and tested novel attack strategies on popular LLMs to expose their vulnerabilities in generating inappropriate content.
arXiv Detail & Related papers (2025-02-23T08:09:23Z) - Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities [49.09703018511403]
Evaluations of large language model (LLM) risks and capabilities are increasingly being incorporated into AI risk management and governance frameworks.
Currently, most risk evaluations are conducted by designing inputs that elicit harmful behaviors from the system.
We propose evaluating LLMs with model tampering attacks which allow for modifications to latent activations or weights.
arXiv Detail & Related papers (2025-02-03T18:59:16Z) - Dual-Model Defense: Safeguarding Diffusion Models from Membership Inference Attacks through Disjoint Data Splitting [6.984396318800444]
Diffusion models have been proven to be vulnerable to Membership Inference Attacks (MIAs)
This paper introduces two novel and efficient approaches to protect diffusion models against MIAs.
arXiv Detail & Related papers (2024-10-22T03:02:29Z) - Evaluating Membership Inference Attacks and Defenses in Federated
Learning [23.080346952364884]
Membership Inference Attacks (MIAs) pose a growing threat to privacy preservation in federated learning.
This paper conducts an evaluation of existing MIAs and corresponding defense strategies.
arXiv Detail & Related papers (2024-02-09T09:58:35Z) - Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration [32.15773300068426]
Membership Inference Attacks aim to infer whether a target data record has been utilized for model training.
We propose a Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA)
arXiv Detail & Related papers (2023-11-10T13:55:05Z) - When Fairness Meets Privacy: Exploring Privacy Threats in Fair Binary Classifiers via Membership Inference Attacks [17.243744418309593]
We propose an efficient MIA method against fairness-enhanced models based on fairness discrepancy results.
We also explore potential strategies for mitigating privacy leakages.
arXiv Detail & Related papers (2023-11-07T10:28:17Z) - Assessing Privacy Risks in Language Models: A Case Study on
Summarization Tasks [65.21536453075275]
We focus on the summarization task and investigate the membership inference (MI) attack.
We exploit text similarity and the model's resistance to document modifications as potential MI signals.
We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
arXiv Detail & Related papers (2023-10-20T05:44:39Z) - Practical Membership Inference Attacks Against Large-Scale Multi-Modal
Models: A Pilot Study [17.421886085918608]
Membership inference attacks (MIAs) aim to infer whether a data point has been used to train a machine learning model.
These attacks can be employed to identify potential privacy vulnerabilities and detect unauthorized use of personal data.
This paper takes a first step towards developing practical MIAs against large-scale multi-modal models.
arXiv Detail & Related papers (2023-09-29T19:38:40Z) - Defending Pre-trained Language Models as Few-shot Learners against
Backdoor Attacks [72.03945355787776]
We advocate MDP, a lightweight, pluggable, and effective defense for PLMs as few-shot learners.
We show analytically that MDP creates an interesting dilemma for the attacker to choose between attack effectiveness and detection evasiveness.
arXiv Detail & Related papers (2023-09-23T04:41:55Z) - Avoid Adversarial Adaption in Federated Learning by Multi-Metric
Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources.
FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks.
We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously.
MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z) - On Evaluating Adversarial Robustness of Large Vision-Language Models [64.66104342002882]
We evaluate the robustness of large vision-language models (VLMs) in the most realistic and high-risk setting.
In particular, we first craft targeted adversarial examples against pretrained models such as CLIP and BLIP.
Black-box queries on these VLMs can further improve the effectiveness of targeted evasion.
arXiv Detail & Related papers (2023-05-26T13:49:44Z) - RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target.
RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead.
Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z) - ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine
Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc.
We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing.
Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.