Related papers: Membership Inference Attacks on Tokenizers of Large Language Models

Membership Inference Attacks on Tokenizers of Large Language Models

URL: http://arxiv.org/abs/2510.05699v1
Date: Tue, 07 Oct 2025 09:05:40 GMT
Title: Membership Inference Attacks on Tokenizers of Large Language Models
Authors: Meng Tong, Yuntao Du, Kejiang Chen, Weiming Zhang, Ninghui Li,
Abstract summary: We present the first study on membership leakage through tokenizers.<n>We explore five attack methods to infer dataset membership.<n>Our findings highlight tokenizers as an overlooked yet critical privacy threat.
Score: 40.2492347972186
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Membership inference attacks (MIAs) are widely used to assess the privacy risks associated with machine learning models. However, when these attacks are applied to pre-trained large language models (LLMs), they encounter significant challenges, including mislabeled samples, distribution shifts, and discrepancies in model size between experimental and real-world settings. To address these limitations, we introduce tokenizers as a new attack vector for membership inference. Specifically, a tokenizer converts raw text into tokens for LLMs. Unlike full models, tokenizers can be efficiently trained from scratch, thereby avoiding the aforementioned challenges. In addition, the tokenizer's training data is typically representative of the data used to pre-train LLMs. Despite these advantages, the potential of tokenizers as an attack vector remains unexplored. To this end, we present the first study on membership leakage through tokenizers and explore five attack methods to infer dataset membership. Extensive experiments on millions of Internet samples reveal the vulnerabilities in the tokenizers of state-of-the-art LLMs. To mitigate this emerging risk, we further propose an adaptive defense. Our findings highlight tokenizers as an overlooked yet critical privacy threat, underscoring the urgent need for privacy-preserving mechanisms specifically designed for them.

Related papers

What Hard Tokens Reveal: Exploiting Low-confidence Tokens for Membership Inference Attacks against Large Language Models [2.621142288968429]
Membership Inference Attacks (MIAs) attempt to determine whether a specific data sample was included in a model training/fine-tuning dataset.<n>We propose a novel membership inference approach that captures the token-level probabilities for low-confidence (hard) tokens.<n>Experiments on both domain-specific medical datasets and general-purpose benchmarks demonstrate that HT-MIA consistently outperforms seven state-of-the-art MIA baselines.
arXiv Detail & Related papers (2026-01-27T22:31:10Z)
Data-Free Privacy-Preserving for LLMs via Model Inversion and Selective Unlearning [27.452191507918148]
Large language models (LLMs) exhibit powerful capabilities but risk memorizing sensitive personally identifiable information (PII) from their training data.<n>We propose Data-Free Selective Unlearning (DFSU), a novel privacy-preserving framework that removes sensitive PII from an LLM without requiring its training data.<n>Our approach first synthesizes pseudo-PII through language model inversion, then constructs token-level privacy masks for these synthetic samples, and finally performs token-level selective unlearning.
arXiv Detail & Related papers (2026-01-22T02:43:12Z)
No Query, No Access [50.18709429731724]
We introduce the textbfVictim Data-based Adrial Attack (VDBA), which operates using only victim texts.<n>To prevent access to the victim model, we create a shadow dataset with publicly available pre-trained models and clustering methods.<n>Experiments on the Emotion and SST5 datasets show that VDBA outperforms state-of-the-art methods, achieving an ASR improvement of 52.08%.
arXiv Detail & Related papers (2025-05-12T06:19:59Z)
Tokens for Learning, Tokens for Unlearning: Mitigating Membership Inference Attacks in Large Language Models via Dual-Purpose Training [13.680205342714412]
Large language models (LLMs) have become the backbone of modern natural language processing but pose privacy concerns about leaking sensitive training data.<n>We propose methodname, a lightweight yet effective empirical privacy defense for protecting training data of language models by leveraging token-specific characteristics.
arXiv Detail & Related papers (2025-02-27T03:37:45Z)
Towards Label-Only Membership Inference Attack against Pre-trained Large Language Models [34.39913818362284]
Membership Inference Attacks (MIAs) aim to predict whether a data sample belongs to the model's training set or not.<n>We propose textbfPETAL: a label-only membership inference attack based on textbfPEr-textbfToken semtextbfAntic simitextbfLL.
arXiv Detail & Related papers (2025-02-26T08:47:19Z)
A Method to Facilitate Membership Inference Attacks in Deep Learning Models [5.724311218570013]
We demonstrate a new form of membership inference attack that is strictly more powerful than prior art. Our attack empowers the adversary to reliably de-identify all the training samples. We show that the models can effectively disguise the amplified membership leakage under common membership privacy auditing.
arXiv Detail & Related papers (2024-07-02T03:33:42Z)
Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress. Our investigation exposes a critical oversight in this belief. By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z)
Chameleon: Increasing Label-Only Membership Leakage with Adaptive Poisoning [8.084254242380057]
Membership Inference (MI) attacks seek to determine whether a particular data sample was included in a model's training dataset. We show that existing label-only MI attacks are ineffective at inferring membership in the low False Positive Rate regime. We propose a new attack Chameleon that leverages a novel adaptive data poisoning strategy and an efficient query selection method.
arXiv Detail & Related papers (2023-10-05T18:46:27Z)
Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks [72.03945355787776]
We advocate MDP, a lightweight, pluggable, and effective defense for PLMs as few-shot learners. We show analytically that MDP creates an interesting dilemma for the attacker to choose between attack effectiveness and detection evasiveness.
arXiv Detail & Related papers (2023-09-23T04:41:55Z)
Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples [128.25509832644025]
There is a growing interest in developing unlearnable examples (UEs) against visual privacy leaks on the Internet. UEs are training samples added with invisible but unlearnable noise, which have been found can prevent unauthorized training of machine learning models. We present a novel technique called Unlearnable Clusters (UCs) to generate label-agnostic unlearnable examples with cluster-wise perturbations.
arXiv Detail & Related papers (2022-12-31T04:26:25Z)
Membership Inference Attacks Against Self-supervised Speech Models [62.73937175625953]
Self-supervised learning (SSL) on continuous speech has started gaining attention. We present the first privacy analysis on several SSL speech models using Membership Inference Attacks (MIA) under black-box access.
arXiv Detail & Related papers (2021-11-09T13:00:24Z)
Sampling Attacks: Amplification of Membership Inference Attacks by Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model. We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance. For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.