Pseudorandom Error-Correcting Codes
- URL: http://arxiv.org/abs/2402.09370v2
- Date: Tue, 18 Jun 2024 01:00:58 GMT
- Title: Pseudorandom Error-Correcting Codes
- Authors: Miranda Christ, Sam Gunn,
- Abstract summary: We build pseudorandom codes that are robust to cryptographic substitution and deletion errors.
We present an undetectable watermarking scheme for outputs of random substitutions and deletions.
Our second application is to steganography, where a secret message is hidden in innocent-looking content.
- Score: 0.716879432974126
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We construct pseudorandom error-correcting codes (or simply pseudorandom codes), which are error-correcting codes with the property that any polynomial number of codewords are pseudorandom to any computationally-bounded adversary. Efficient decoding of corrupted codewords is possible with the help of a decoding key. We build pseudorandom codes that are robust to substitution and deletion errors, where pseudorandomness rests on standard cryptographic assumptions. Specifically, pseudorandomness is based on either $2^{O(\sqrt{n})}$-hardness of LPN, or polynomial hardness of LPN and the planted XOR problem at low density. As our primary application of pseudorandom codes, we present an undetectable watermarking scheme for outputs of language models that is robust to cropping and a constant rate of random substitutions and deletions. The watermark is undetectable in the sense that any number of samples of watermarked text are computationally indistinguishable from text output by the original model. This is the first undetectable watermarking scheme that can tolerate a constant rate of errors. Our second application is to steganography, where a secret message is hidden in innocent-looking content. We present a constant-rate stateless steganography scheme with robustness to a constant rate of substitutions. Ours is the first stateless steganography scheme with provable steganographic security and any robustness to errors.
Related papers
- New constructions of pseudorandom codes [23.22566380210149]
Pseudorandom error-correcting codes (PRCs) are a new cryptographic primitive with applications in watermarking generative AI models.
We show that if both the planted hyperloop assumption introduced in [BKR23] and security of a version of Goldreich's PRG hold, then there exist public-key PRCs for which no efficient adversary can distinguish a number of codewords from random.
We show how to construct secret-key PRCs of length $O(n)$ which are $textitunconditionally$ indistinguishable random by $textpoly(
arXiv Detail & Related papers (2024-09-11T19:14:39Z) - Watermarking Language Models with Error Correcting Codes [41.21656847672627]
We propose a watermarking framework that encodes statistical signals through an error correcting code.
Our method, termed robust binary code (RBC) watermark, introduces no distortion compared to the original probability distribution.
Our empirical findings suggest our watermark is fast, powerful, and robust, comparing favorably to the state-of-the-art.
arXiv Detail & Related papers (2024-06-12T05:13:09Z) - Edit Distance Robust Watermarks for Language Models [29.69428894587431]
Motivated by the problem of detecting AI-generated text, we consider the problem of watermarking the output of language models with provable guarantees.
We aim for watermarks which satisfy: (a) undetectability, a cryptographic notion introduced by Christ, Gunn & Zamir (2024) and (b) robustness to channels which introduce a constant fraction of adversarial insertions, substitutions, and deletions.
arXiv Detail & Related papers (2024-06-04T04:03:17Z) - Correcting Subverted Random Oracles [55.4766447972367]
We prove that a simple construction can transform a "subverted" random oracle which disagrees with the original one at a small fraction of inputs into an object that is indifferentiable from a random function.
Our results permit future designers of cryptographic primitives in typical kleptographic settings to use random oracles as a trusted black box.
arXiv Detail & Related papers (2024-04-15T04:01:50Z) - Provably Secure Disambiguating Neural Linguistic Steganography [66.30965740387047]
The segmentation ambiguity problem, which arises when using language models based on subwords, leads to occasional decoding failures.
We propose a novel secure disambiguation method named SyncPool, which effectively addresses the segmentation ambiguity problem.
SyncPool does not change the size of the candidate pool or the distribution of tokens and thus is applicable to provably secure language steganography methods.
arXiv Detail & Related papers (2024-03-26T09:25:57Z) - An Unforgeable Publicly Verifiable Watermark for Large Language Models [84.2805275589553]
Current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection.
We propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages.
arXiv Detail & Related papers (2023-07-30T13:43:27Z) - Who Wrote this Code? Watermarking for Code Generation [53.24895162874416]
We propose Selective WatErmarking via Entropy Thresholding (SWEET) to detect machine-generated text.
Our experiments show that SWEET significantly improves code quality preservation while outperforming all baselines.
arXiv Detail & Related papers (2023-05-24T11:49:52Z) - Watermarking Text Generated by Black-Box Language Models [103.52541557216766]
A watermark-based method was proposed for white-box LLMs, allowing them to embed watermarks during text generation.
A detection algorithm aware of the list can identify the watermarked text.
We develop a watermarking framework for black-box language model usage scenarios.
arXiv Detail & Related papers (2023-05-14T07:37:33Z) - A Watermark for Large Language Models [84.95327142027183]
We propose a watermarking framework for proprietary language models.
The watermark can be embedded with negligible impact on text quality.
It can be detected using an efficient open-source algorithm without access to the language model API or parameters.
arXiv Detail & Related papers (2023-01-24T18:52:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.