A High-Capacity and Secure Disambiguation Algorithm for Neural Linguistic Steganography
- URL: http://arxiv.org/abs/2510.02332v1
- Date: Fri, 26 Sep 2025 05:56:32 GMT
- Title: A High-Capacity and Secure Disambiguation Algorithm for Neural Linguistic Steganography
- Authors: Yapei Feng, Feng Jiang, Shanhao Wu, Hua Zhong,
- Abstract summary: Neural linguistic steganography aims to embed information into natural text while preserving statistical undetectability.<n>Recent method, SyncPool, addresses this ambiguity by employing a coarse-grained synchronization mechanism over groups of ambiguous candidates.<n>We propose a method named look-ahead Sync, which overcomes the capacity limitation of SyncPool while retaining its provable security guarantees.
- Score: 5.002412595399629
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural linguistic steganography aims to embed information into natural text while preserving statistical undetectability. A fundamental challenge in this ffeld stems from tokenization ambiguity in modern tokenizers, which can lead to catastrophic decoding failures. The recent method, SyncPool, addresses this ambiguity by employing a coarse-grained synchronization mechanism over groups of ambiguous candidates. However, SyncPool sacriffces embedding capacity, as it utilizes the entire Shannon entropy of an ambiguous group solely for synchronization rather than for payload embedding. We propose a method named look-ahead Sync, which overcomes the capacity limitation of SyncPool while retaining its provable security guarantees. Our approach performs minimal synchronized sampling only on truly indistinguishable token sequences, while strategically preserving all other discernible paths to maximize embedding capacity. We provide theoretical proofs for the security of our method and analyze the gap between its achievable embedding capacity and the theoretical upper bound. Experiments on English (using Llama 3) and Chinese (using Qwen 2.5) benchmarks show that our method consistently approaches the theoretical capacity upper bound and signiffcantly outperforms SyncPool. The improvement in embedding rate exceeds 160% in English and 25% in Chinese, particularly in settings with larger candidate pools. This work represents a signiffcant step toward practical high-capacity provably secure linguistic steganography.
Related papers
- Deferred Commitment Decoding for Diffusion Language Models with Confidence-Aware Sliding Windows [33.361153168706444]
We propose Deferred Commitment Decoding (DCD) as a training-free decoding strategy.<n>DCD maintains a confidence-aware sliding window over masked tokens, resolving low-uncertainty tokens early while deferring high-uncertainty tokens until sufficient contextual evidence becomes available.<n>Experiments show that DCD improves generation accuracy by 1.39% with comparable time on average compared to fixed block-based diffusion methods, with the most significant improvement reaching 9.0%.
arXiv Detail & Related papers (2026-01-05T12:57:33Z) - A high-capacity linguistic steganography based on entropy-driven rank-token mapping [81.29800498695899]
Linguistic steganography enables covert communication through embedding secret messages into innocuous texts.<n>Traditional modification-based methods introduce detectable anomalies, while retrieval-based strategies suffer from low embedding capacity.<n>We propose an entropy-driven framework called RTMStega that integrates rank-based adaptive coding and context-aware decompression with normalized entropy.
arXiv Detail & Related papers (2025-10-27T06:02:47Z) - SIM-CoT: Supervised Implicit Chain-of-Thought [108.30049193668083]
Implicit Chain-of-Thought (CoT) methods offer a token-efficient alternative to explicit CoT reasoning in Large Language Models.<n>We identify a core latent instability issue when scaling the computational budget of implicit CoT.<n>We propose SIM-CoT, a plug-and-play training module that introduces step-level supervision to stabilize and enrich the latent reasoning space.
arXiv Detail & Related papers (2025-09-24T17:01:32Z) - Efficient Decoding Methods for Language Models on Encrypted Data [32.58944595512403]
Homomorphic encryption (HE) enables computation on encrypted data for secure inference.<n>Neural text generation requires decoding methods like argmax and sampling, which are non-polynomial and thus computationally expensive under encryption.<n>We introduce cutmax, an HE-friendly argmax algorithm that reduces cipher operations compared to prior methods, enabling practical greedy decoding under encryption.
arXiv Detail & Related papers (2025-09-10T08:23:14Z) - Provably Secure Disambiguating Neural Linguistic Steganography [66.30965740387047]
The segmentation ambiguity problem, which arises when using language models based on subwords, leads to occasional decoding failures.<n>We propose a novel secure disambiguation method named SyncPool, which effectively addresses the segmentation ambiguity problem.<n> SyncPool does not change the size of the candidate pool or the distribution of tokens and thus is applicable to provably secure language steganography methods.
arXiv Detail & Related papers (2024-03-26T09:25:57Z) - Rethinking SIGN Training: Provable Nonconvex Acceleration without First-
and Second-Order Gradient Lipschitz [66.22095739795068]
Sign-based methods have gained attention due to their ability to achieve robust performance despite only using only the sign information for parameter updates.
The current convergence analysis of sign-based methods relies on the strong assumptions of first-order acceleration and second-order acceleration.
In this paper we analyze their convergence under more realistic assumptions of first- and second-order acceleration.
arXiv Detail & Related papers (2023-10-23T06:48:43Z) - CTC-based Non-autoregressive Speech Translation [51.37920141751813]
We investigate the potential of connectionist temporal classification for non-autoregressive speech translation.
We develop a model consisting of two encoders that are guided by CTC to predict the source and target texts.
Experiments on the MuST-C benchmarks show that our NAST model achieves an average BLEU score of 29.5 with a speed-up of 5.67$times$.
arXiv Detail & Related papers (2023-05-27T03:54:09Z) - Unbounded Gradients in Federated Leaning with Buffered Asynchronous
Aggregation [0.6526824510982799]
The textitFedBuff algorithm allows asynchronous updates while preserving privacy via secure aggregation.
This paper presents a theoretical analysis of the convergence rate of this algorithm when heterogeneity in data, batch size, and delay are considered.
arXiv Detail & Related papers (2022-10-03T18:20:48Z) - Efficient First-Order Contextual Bandits: Prediction, Allocation, and
Triangular Discrimination [82.52105963476703]
A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise.
First-order guarantees are relatively well understood in statistical and online learning.
We show that the logarithmic loss and an information-theoretic quantity called the triangular discrimination play a fundamental role in obtaining first-order guarantees.
arXiv Detail & Related papers (2021-07-05T19:20:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.