Provably Secure Disambiguating Neural Linguistic Steganography
- URL: http://arxiv.org/abs/2403.17524v1
- Date: Tue, 26 Mar 2024 09:25:57 GMT
- Title: Provably Secure Disambiguating Neural Linguistic Steganography
- Authors: Yuang Qi, Kejiang Chen, Kai Zeng, Weiming Zhang, Nenghai Yu,
- Abstract summary: The segmentation ambiguity problem, which arises when using language models based on subwords, leads to occasional decoding failures.
We propose a novel secure disambiguation method named SyncPool, which effectively addresses the segmentation ambiguity problem.
SyncPool does not change the size of the candidate pool or the distribution of tokens and thus is applicable to provably secure language steganography methods.
- Score: 66.30965740387047
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent research in provably secure neural linguistic steganography has overlooked a crucial aspect: the sender must detokenize stegotexts to avoid raising suspicion from the eavesdropper. The segmentation ambiguity problem, which arises when using language models based on subwords, leads to occasional decoding failures in all neural language steganography implementations based on these models. Current solutions to this issue involve altering the probability distribution of candidate words, rendering them incompatible with provably secure steganography. We propose a novel secure disambiguation method named SyncPool, which effectively addresses the segmentation ambiguity problem. We group all tokens with prefix relationships in the candidate pool before the steganographic embedding algorithm runs to eliminate uncertainty among ambiguous tokens. To enable the receiver to synchronize the sampling process of the sender, a shared cryptographically-secure pseudorandom number generator (CSPRNG) is deployed to select a token from the ambiguity pool. SyncPool does not change the size of the candidate pool or the distribution of tokens and thus is applicable to provably secure language steganography methods. We provide theoretical proofs and experimentally demonstrate the applicability of our solution to various languages and models, showing its potential to significantly improve the reliability and security of neural linguistic steganography systems.
Related papers
- Edit Distance Robust Watermarks for Language Models [29.69428894587431]
Motivated by the problem of detecting AI-generated text, we consider the problem of watermarking the output of language models with provable guarantees.
We aim for watermarks which satisfy: (a) undetectability, a cryptographic notion introduced by Christ, Gunn & Zamir (2024) and (b) robustness to channels which introduce a constant fraction of adversarial insertions, substitutions, and deletions.
arXiv Detail & Related papers (2024-06-04T04:03:17Z) - GEC-DePenD: Non-Autoregressive Grammatical Error Correction with
Decoupled Permutation and Decoding [52.14832976759585]
Grammatical error correction (GEC) is an important NLP task that is usually solved with autoregressive sequence-to-sequence models.
We propose a novel non-autoregressive approach to GEC that decouples the architecture into a permutation network.
We show that the resulting network improves over previously known non-autoregressive methods for GEC.
arXiv Detail & Related papers (2023-11-14T14:24:36Z) - Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks [39.51297217854375]
We propose Text-CRS, a certified robustness framework for natural language processing (NLP) based on randomized smoothing.
We show that Text-CRS can address all four different word-level adversarial operations and achieve a significant accuracy improvement.
We also provide the first benchmark on certified accuracy and radius of four word-level operations, besides outperforming the state-of-the-art certification against synonym substitution attacks.
arXiv Detail & Related papers (2023-07-31T13:08:16Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Uncertainty-aware Self-training for Low-resource Neural Sequence
Labeling [29.744621356187764]
This paper presents SeqUST, a novel uncertain-aware self-training framework for Neural sequence labeling (NSL)
We incorporate Monte Carlo (MC) dropout in Bayesian neural network (BNN) to perform uncertainty estimation at the token level and then select reliable language tokens from unlabeled data.
A well-designed masked sequence labeling task with a noise-robust loss supports robust training, which aims to suppress the problem of noisy pseudo labels.
arXiv Detail & Related papers (2023-02-17T02:40:04Z) - ConNER: Consistency Training for Cross-lingual Named Entity Recognition [96.84391089120847]
Cross-lingual named entity recognition suffers from data scarcity in the target languages.
We propose ConNER as a novel consistency training framework for cross-lingual NER.
arXiv Detail & Related papers (2022-11-17T07:57:54Z) - A Contrastive Framework for Neural Text Generation [46.845997620234265]
We show that an underlying reason for model degeneration is the anisotropic distribution of token representations.
We present a contrastive solution: (i) SimCTG, a contrastive training objective to calibrate the model's representation space, and (ii) a decoding method -- contrastive search -- to encourage diversity while maintaining coherence in the generated text.
arXiv Detail & Related papers (2022-02-13T21:46:14Z) - Provably Secure Generative Linguistic Steganography [29.919406917681282]
We present a novel provably secure generative linguistic steganographic method ADG.
ADG embeds secret information by Adaptive Dynamic Grouping of tokens according to their probability given by an off-the-shelf language model.
arXiv Detail & Related papers (2021-06-03T17:27:10Z) - Consensus-Guided Correspondence Denoising [67.35345850146393]
We propose to denoise correspondences with a local-to-global consensus learning framework to robustly identify correspondence.
A novel "pruning" block is introduced to distill reliable candidates from initial matches according to their consensus scores estimated by dynamic graphs from local to global regions.
Our method outperforms state-of-the-arts on robust line fitting, wide-baseline image matching and image localization benchmarks by noticeable margins.
arXiv Detail & Related papers (2021-01-03T09:10:00Z) - Consistency of a Recurrent Language Model With Respect to Incomplete
Decoding [67.54760086239514]
We study the issue of receiving infinite-length sequences from a recurrent language model.
We propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model.
arXiv Detail & Related papers (2020-02-06T19:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.