Related papers: Differential Privacy and Natural Language Processing to Generate Contextually Similar Decoy Messages in Honey Encryption Scheme

Differential Privacy and Natural Language Processing to Generate Contextually Similar Decoy Messages in Honey Encryption Scheme

URL: http://arxiv.org/abs/2010.15985v1
Date: Thu, 29 Oct 2020 23:02:32 GMT
Title: Differential Privacy and Natural Language Processing to Generate Contextually Similar Decoy Messages in Honey Encryption Scheme
Authors: Kunjal Panchal
Abstract summary: Honey Encryption is an approach to encrypt the messages using low min-entropy keys, such as weak passwords, OTPs, PINs, credit card numbers. The ciphertext is produces, when decrypted with any number of incorrect keys, produces plausible-looking but bogus plaintext called "honey messages" A gibberish, random assortment of words is not enough to fool an attacker; that will not be acceptable and convincing, whether or not the attacker knows some information of the genuine source.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Honey Encryption is an approach to encrypt the messages using low min-entropy keys, such as weak passwords, OTPs, PINs, credit card numbers. The ciphertext is produces, when decrypted with any number of incorrect keys, produces plausible-looking but bogus plaintext called "honey messages". But the current techniques used in producing the decoy plaintexts do not model human language entirely. A gibberish, random assortment of words is not enough to fool an attacker; that will not be acceptable and convincing, whether or not the attacker knows some information of the genuine source. In this paper, I focus on the plaintexts which are some non-numeric informative messages. In order to fool the attacker into believing that the decoy message can actually be from a certain source, we need to capture the empirical and contextual properties of the language. That is, there should be no linguistic difference between real and fake message, without revealing the structure of the real message. I employ natural language processing and generalized differential privacy to solve this problem. Mainly I focus on machine learning methods like keyword extraction, context classification, bags-of-words, word embeddings, transformers for text processing to model privacy for text documents. Then I prove the security of this approach with e-differential privacy.

Related papers

Robust Steganography from Large Language Models [1.5749416770494704]
We study the problem of robust steganography. We design and implement our steganographic schemes that embed arbitrary secret messages into natural language text.
arXiv Detail & Related papers (2025-04-11T21:06:36Z)
An LLM Framework For Cryptography Over Chat Channels [0.13108652488669734]
Governments all over the world are proposing legislation to detect, backdoor, or even ban encrypted communication. We propose a novel cryptographic embedding framework that enables covert Public Key or Symmetric Key encrypted communication over public chat channels with humanlike produced texts.
arXiv Detail & Related papers (2025-04-11T11:34:14Z)
Silent Guardian: Protecting Text from Malicious Exploitation by Large Language Models [63.91178922306669]
We introduce Silent Guardian, a text protection mechanism against large language models (LLMs) By carefully modifying the text to be protected, TPE can induce LLMs to first sample the end token, thus directly terminating the interaction. We show that SG can effectively protect the target text under various configurations and achieve almost 100% protection success rate in some cases.
arXiv Detail & Related papers (2023-12-15T10:30:36Z)
Reverse-Engineering Decoding Strategies Given Blackbox Access to a Language Generation System [73.52878118434147]
We present methods to reverse-engineer the decoding method used to generate text. Our ability to discover which decoding strategy was used has implications for detecting generated text.
arXiv Detail & Related papers (2023-09-09T18:19:47Z)
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher [85.18213923151717]
Experimental results show certain ciphers succeed almost 100% of the time to bypass the safety alignment of GPT-4 in several safety domains. We propose a novel SelfCipher that uses only role play and several demonstrations in natural language to evoke this capability.
arXiv Detail & Related papers (2023-08-12T04:05:57Z)
CipherSniffer: Classifying Cipher Types [0.0]
We frame the decryption task as a classification problem. We first create a dataset of transpositions, substitutions, text reversals, word reversals, sentence shifts, and unencrypted text.
arXiv Detail & Related papers (2023-06-13T20:18:24Z)
General Framework for Reversible Data Hiding in Texts Based on Masked Language Modeling [15.136429369639686]
We propose a general framework to embed secret information into a given cover text. The embedded information and the original cover text can be perfectly retrieved from the marked text. Our results show that the original cover text and the secret information can be successfully embedded and extracted.
arXiv Detail & Related papers (2022-06-21T05:02:49Z)
Can Sequence-to-Sequence Models Crack Substitution Ciphers? [15.898270650875158]
State-of-the-art decipherment methods use beam search and a neural language model to score candidate hypotheses for a given cipher. We show that our proposed method can decipher text without explicit language identification and can still be robust to noise.
arXiv Detail & Related papers (2020-12-30T17:16:33Z)
TextHide: Tackling Data Privacy in Language Understanding Tasks [54.11691303032022]
TextHide mitigates privacy risks without slowing down training or reducing accuracy. It requires all participants to add a simple encryption step to prevent an eavesdropping attacker from recovering private text data. We evaluate TextHide on the GLUE benchmark, and our experiments show that TextHide can effectively defend attacks on shared gradients or representations.
arXiv Detail & Related papers (2020-10-12T22:22:15Z)
Near-imperceptible Neural Linguistic Steganography via Self-Adjusting Arithmetic Coding [88.31226340759892]
We present a new linguistic steganography method which encodes secret messages using self-adjusting arithmetic coding based on a neural language model. Human evaluations show that 51% of generated cover texts can indeed fool eavesdroppers.
arXiv Detail & Related papers (2020-10-01T20:40:23Z)
Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding [80.3811072650087]
We study natural language watermarking as a defense to help better mark and trace the provenance of text. We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training. AWT is the first end-to-end model to hide data in text by automatically learning -- without ground truth -- word substitutions along with their locations.
arXiv Detail & Related papers (2020-09-07T11:01:24Z)
Privacy Guarantees for De-identifying Text Transformations [17.636430224292866]
We derive formal privacy guarantees for text transformation-based de-identification methods on the basis of Differential Privacy. We compare a simple redact approach with more sophisticated word-by-word replacement using deep learning models on multiple natural language understanding tasks. We find that only word-by-word replacement is robust against performance drops in various tasks.
arXiv Detail & Related papers (2020-08-07T12:06:42Z)
De-Anonymizing Text by Fingerprinting Language Generation [24.09735516192663]
We show how an attacker can infer typed text by measuring these fingerprints via a suitable side channel. This attack could help de-anonymize anonymous texts, and discuss defenses.
arXiv Detail & Related papers (2020-06-17T02:49:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.