Can Sequence-to-Sequence Models Crack Substitution Ciphers?
- URL: http://arxiv.org/abs/2012.15229v1
- Date: Wed, 30 Dec 2020 17:16:33 GMT
- Title: Can Sequence-to-Sequence Models Crack Substitution Ciphers?
- Authors: Nada Aldarrab and Jonathan May
- Abstract summary: State-of-the-art decipherment methods use beam search and a neural language model to score candidate hypotheses for a given cipher.
We show that our proposed method can decipher text without explicit language identification and can still be robust to noise.
- Score: 15.898270650875158
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decipherment of historical ciphers is a challenging problem. The language of
the target plaintext might be unknown, and ciphertext can have a lot of noise.
State-of-the-art decipherment methods use beam search and a neural language
model to score candidate plaintext hypotheses for a given cipher, assuming
plaintext language is known. We propose an end-to-end multilingual model for
solving simple substitution ciphers. We test our model on synthetic and real
historical ciphers and show that our proposed method can decipher text without
explicit language identification and can still be robust to noise.
Related papers
- Correcting Subverted Random Oracles [55.4766447972367]
We prove that a simple construction can transform a "subverted" random oracle which disagrees with the original one at a small fraction of inputs into an object that is indifferentiable from a random function.
Our results permit future designers of cryptographic primitives in typical kleptographic settings to use random oracles as a trusted black box.
arXiv Detail & Related papers (2024-04-15T04:01:50Z) - Provably Secure Disambiguating Neural Linguistic Steganography [66.30965740387047]
The segmentation ambiguity problem, which arises when using language models based on subwords, leads to occasional decoding failures.
We propose a novel secure disambiguation method named SyncPool, which effectively addresses the segmentation ambiguity problem.
SyncPool does not change the size of the candidate pool or the distribution of tokens and thus is applicable to provably secure language steganography methods.
arXiv Detail & Related papers (2024-03-26T09:25:57Z) - Reverse-Engineering Decoding Strategies Given Blackbox Access to a
Language Generation System [73.52878118434147]
We present methods to reverse-engineer the decoding method used to generate text.
Our ability to discover which decoding strategy was used has implications for detecting generated text.
arXiv Detail & Related papers (2023-09-09T18:19:47Z) - GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher [85.18213923151717]
Experimental results show certain ciphers succeed almost 100% of the time to bypass the safety alignment of GPT-4 in several safety domains.
We propose a novel SelfCipher that uses only role play and several demonstrations in natural language to evoke this capability.
arXiv Detail & Related papers (2023-08-12T04:05:57Z) - Classifying World War II Era Ciphers with Machine Learning [1.6317061277457]
We classify Enigma, M-209, Sigaba, Purple, and Typex ciphers from World War II era.
We find that classic machine learning models perform at least as well as deep learning models.
ciphers that are more similar in design are somewhat more challenging to distinguish, but not as difficult as might be expected.
arXiv Detail & Related papers (2023-07-02T07:20:47Z) - CipherSniffer: Classifying Cipher Types [0.0]
We frame the decryption task as a classification problem.
We first create a dataset of transpositions, substitutions, text reversals, word reversals, sentence shifts, and unencrypted text.
arXiv Detail & Related papers (2023-06-13T20:18:24Z) - Lexinvariant Language Models [84.2829117441298]
Token embeddings, a mapping from discrete lexical symbols to continuous vectors, are at the heart of any language model (LM)
We study textitlexinvariantlanguage models that are invariant to lexical symbols and therefore do not need fixed token embeddings in practice.
We show that a lexinvariant LM can attain perplexity comparable to that of a standard language model, given a sufficiently long context.
arXiv Detail & Related papers (2023-05-24T19:10:46Z) - Memorization for Good: Encryption with Autoregressive Language Models [8.645826579841692]
We propose the first symmetric encryption algorithm with autoregressive language models (SELM)
We show that autoregressive LMs can encode arbitrary data into a compact real-valued vector (i.e., encryption) and then losslessly decode the vector to the original message (i.e. decryption) via random subspace optimization and greedy decoding.
arXiv Detail & Related papers (2023-05-15T05:42:34Z) - A Non-monotonic Self-terminating Language Model [62.93465126911921]
In this paper, we focus on the problem of non-terminating sequences resulting from an incomplete decoding algorithm.
We first define an incomplete probable decoding algorithm which includes greedy search, top-$k$ sampling, and nucleus sampling.
We then propose a non-monotonic self-terminating language model, which relaxes the constraint of monotonically increasing termination probability.
arXiv Detail & Related papers (2022-10-03T00:28:44Z) - Segmenting Numerical Substitution Ciphers [27.05304607253758]
Deciphering historical substitution ciphers is a challenging problem.
We propose the first automatic methods to segment those ciphers using Byte Pair.
We also propose a method for solving non-deterministic ciphers with existing keys using a lattice and a pretrained language model.
arXiv Detail & Related papers (2022-05-25T06:45:59Z) - A Few-shot Learning Approach for Historical Ciphered Manuscript
Recognition [3.0682439731292592]
We propose a novel method for handwritten ciphers recognition based on few-shot object detection.
By training on synthetic data, we show that the proposed architecture is able to recognize handwritten ciphers with unseen alphabets.
arXiv Detail & Related papers (2020-09-26T11:49:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.