De-Anonymizing Text by Fingerprinting Language Generation
- URL: http://arxiv.org/abs/2006.09615v2
- Date: Tue, 3 Nov 2020 04:47:25 GMT
- Title: De-Anonymizing Text by Fingerprinting Language Generation
- Authors: Zhen Sun, Roei Schuster, Vitaly Shmatikov
- Abstract summary: We show how an attacker can infer typed text by measuring these fingerprints via a suitable side channel.
This attack could help de-anonymize anonymous texts, and discuss defenses.
- Score: 24.09735516192663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Components of machine learning systems are not (yet) perceived as security
hotspots. Secure coding practices, such as ensuring that no execution paths
depend on confidential inputs, have not yet been adopted by ML developers. We
initiate the study of code security of ML systems by investigating how nucleus
sampling---a popular approach for generating text, used for applications such
as auto-completion---unwittingly leaks texts typed by users. Our main result is
that the series of nucleus sizes for many natural English word sequences is a
unique fingerprint. We then show how an attacker can infer typed text by
measuring these fingerprints via a suitable side channel (e.g., cache access
times), explain how this attack could help de-anonymize anonymous texts, and
discuss defenses.
Related papers
- Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts.
We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z) - Provably Secure Disambiguating Neural Linguistic Steganography [66.30965740387047]
The segmentation ambiguity problem, which arises when using language models based on subwords, leads to occasional decoding failures.
We propose a novel secure disambiguation method named SyncPool, which effectively addresses the segmentation ambiguity problem.
SyncPool does not change the size of the candidate pool or the distribution of tokens and thus is applicable to provably secure language steganography methods.
arXiv Detail & Related papers (2024-03-26T09:25:57Z) - OrderBkd: Textual backdoor attack through repositioning [0.0]
Third-party datasets and pre-trained machine learning models pose a threat to NLP systems.
Existing backdoor attacks involve poisoning the data samples such as insertion of tokens or sentence paraphrasing.
Our main difference from the previous work is that we use the reposition of a two words in a sentence as a trigger.
arXiv Detail & Related papers (2024-02-12T14:53:37Z) - Punctuation Matters! Stealthy Backdoor Attack for Language Models [36.91297828347229]
A backdoored model produces normal outputs on the clean samples while performing improperly on the texts.
Some attack methods even cause grammatical issues or change the semantic meaning of the original texts.
We propose a novel stealthy backdoor attack method against textual models, which is called textbfPuncAttack.
arXiv Detail & Related papers (2023-12-26T03:26:20Z) - Reverse-Engineering Decoding Strategies Given Blackbox Access to a
Language Generation System [73.52878118434147]
We present methods to reverse-engineer the decoding method used to generate text.
Our ability to discover which decoding strategy was used has implications for detecting generated text.
arXiv Detail & Related papers (2023-09-09T18:19:47Z) - Paraphrasing evades detectors of AI-generated text, but retrieval is an
effective defense [56.077252790310176]
We present a paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering.
Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking.
We introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.
arXiv Detail & Related papers (2023-03-23T16:29:27Z) - Can AI-Generated Text be Reliably Detected? [54.670136179857344]
Unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc.
Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques.
In this paper, we show that these detectors are not reliable in practical scenarios.
arXiv Detail & Related papers (2023-03-17T17:53:19Z) - Unsupervised Deep Keyphrase Generation [14.544869226959612]
Keyphrase generation aims to summarize long documents with a collection of salient phrases.
Deep neural models have demonstrated a remarkable success in this task, capable of predicting keyphrases that are even absent from a document.
We present a novel method for keyphrase generation, AutoKeyGen, without the supervision of any human annotation.
arXiv Detail & Related papers (2021-04-18T05:53:19Z) - Adversarial Watermarking Transformer: Towards Tracing Text Provenance
with Data Hiding [80.3811072650087]
We study natural language watermarking as a defense to help better mark and trace the provenance of text.
We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training.
AWT is the first end-to-end model to hide data in text by automatically learning -- without ground truth -- word substitutions along with their locations.
arXiv Detail & Related papers (2020-09-07T11:01:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.