Differential Privacy and Natural Language Processing to Generate
Contextually Similar Decoy Messages in Honey Encryption Scheme
- URL: http://arxiv.org/abs/2010.15985v1
- Date: Thu, 29 Oct 2020 23:02:32 GMT
- Title: Differential Privacy and Natural Language Processing to Generate
Contextually Similar Decoy Messages in Honey Encryption Scheme
- Authors: Kunjal Panchal
- Abstract summary: Honey Encryption is an approach to encrypt the messages using low min-entropy keys, such as weak passwords, OTPs, PINs, credit card numbers.
The ciphertext is produces, when decrypted with any number of incorrect keys, produces plausible-looking but bogus plaintext called "honey messages"
A gibberish, random assortment of words is not enough to fool an attacker; that will not be acceptable and convincing, whether or not the attacker knows some information of the genuine source.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Honey Encryption is an approach to encrypt the messages using low min-entropy
keys, such as weak passwords, OTPs, PINs, credit card numbers. The ciphertext
is produces, when decrypted with any number of incorrect keys, produces
plausible-looking but bogus plaintext called "honey messages". But the current
techniques used in producing the decoy plaintexts do not model human language
entirely. A gibberish, random assortment of words is not enough to fool an
attacker; that will not be acceptable and convincing, whether or not the
attacker knows some information of the genuine source.
In this paper, I focus on the plaintexts which are some non-numeric
informative messages. In order to fool the attacker into believing that the
decoy message can actually be from a certain source, we need to capture the
empirical and contextual properties of the language. That is, there should be
no linguistic difference between real and fake message, without revealing the
structure of the real message. I employ natural language processing and
generalized differential privacy to solve this problem. Mainly I focus on
machine learning methods like keyword extraction, context classification,
bags-of-words, word embeddings, transformers for text processing to model
privacy for text documents. Then I prove the security of this approach with
e-differential privacy.
Related papers
- Silent Guardian: Protecting Text from Malicious Exploitation by Large Language Models [63.91178922306669]
We introduce Silent Guardian, a text protection mechanism against large language models (LLMs)
By carefully modifying the text to be protected, TPE can induce LLMs to first sample the end token, thus directly terminating the interaction.
We show that SG can effectively protect the target text under various configurations and achieve almost 100% protection success rate in some cases.
arXiv Detail & Related papers (2023-12-15T10:30:36Z) - Reverse-Engineering Decoding Strategies Given Blackbox Access to a
Language Generation System [73.52878118434147]
We present methods to reverse-engineer the decoding method used to generate text.
Our ability to discover which decoding strategy was used has implications for detecting generated text.
arXiv Detail & Related papers (2023-09-09T18:19:47Z) - GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher [85.18213923151717]
Experimental results show certain ciphers succeed almost 100% of the time to bypass the safety alignment of GPT-4 in several safety domains.
We propose a novel SelfCipher that uses only role play and several demonstrations in natural language to evoke this capability.
arXiv Detail & Related papers (2023-08-12T04:05:57Z) - CipherSniffer: Classifying Cipher Types [0.0]
We frame the decryption task as a classification problem.
We first create a dataset of transpositions, substitutions, text reversals, word reversals, sentence shifts, and unencrypted text.
arXiv Detail & Related papers (2023-06-13T20:18:24Z) - General Framework for Reversible Data Hiding in Texts Based on Masked
Language Modeling [15.136429369639686]
We propose a general framework to embed secret information into a given cover text.
The embedded information and the original cover text can be perfectly retrieved from the marked text.
Our results show that the original cover text and the secret information can be successfully embedded and extracted.
arXiv Detail & Related papers (2022-06-21T05:02:49Z) - Can Sequence-to-Sequence Models Crack Substitution Ciphers? [15.898270650875158]
State-of-the-art decipherment methods use beam search and a neural language model to score candidate hypotheses for a given cipher.
We show that our proposed method can decipher text without explicit language identification and can still be robust to noise.
arXiv Detail & Related papers (2020-12-30T17:16:33Z) - TextHide: Tackling Data Privacy in Language Understanding Tasks [54.11691303032022]
TextHide mitigates privacy risks without slowing down training or reducing accuracy.
It requires all participants to add a simple encryption step to prevent an eavesdropping attacker from recovering private text data.
We evaluate TextHide on the GLUE benchmark, and our experiments show that TextHide can effectively defend attacks on shared gradients or representations.
arXiv Detail & Related papers (2020-10-12T22:22:15Z) - Near-imperceptible Neural Linguistic Steganography via Self-Adjusting
Arithmetic Coding [88.31226340759892]
We present a new linguistic steganography method which encodes secret messages using self-adjusting arithmetic coding based on a neural language model.
Human evaluations show that 51% of generated cover texts can indeed fool eavesdroppers.
arXiv Detail & Related papers (2020-10-01T20:40:23Z) - Adversarial Watermarking Transformer: Towards Tracing Text Provenance
with Data Hiding [80.3811072650087]
We study natural language watermarking as a defense to help better mark and trace the provenance of text.
We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training.
AWT is the first end-to-end model to hide data in text by automatically learning -- without ground truth -- word substitutions along with their locations.
arXiv Detail & Related papers (2020-09-07T11:01:24Z) - Privacy Guarantees for De-identifying Text Transformations [17.636430224292866]
We derive formal privacy guarantees for text transformation-based de-identification methods on the basis of Differential Privacy.
We compare a simple redact approach with more sophisticated word-by-word replacement using deep learning models on multiple natural language understanding tasks.
We find that only word-by-word replacement is robust against performance drops in various tasks.
arXiv Detail & Related papers (2020-08-07T12:06:42Z) - De-Anonymizing Text by Fingerprinting Language Generation [24.09735516192663]
We show how an attacker can infer typed text by measuring these fingerprints via a suitable side channel.
This attack could help de-anonymize anonymous texts, and discuss defenses.
arXiv Detail & Related papers (2020-06-17T02:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.