A Content-Preserving Secure Linguistic Steganography
- URL: http://arxiv.org/abs/2511.12565v1
- Date: Sun, 16 Nov 2025 11:50:13 GMT
- Title: A Content-Preserving Secure Linguistic Steganography
- Authors: Lingyun Xiang, Chengfu Ou, Xu He, Zhongliang Yang, Yuling Liu,
- Abstract summary: We propose a content-preserving linguistic steganography paradigm for perfectly secure covert communication without modifying the cover text.<n>We introduce CLstega, a novel method that embeds secret messages through controllable distribution transformation.<n> Experimental results show that CLstega can achieve a 100% extraction success rate, and outperforms existing methods in security.
- Score: 21.247775412166117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing linguistic steganography methods primarily rely on content transformations to conceal secret messages. However, they often cause subtle yet looking-innocent deviations between normal and stego texts, posing potential security risks in real-world applications. To address this challenge, we propose a content-preserving linguistic steganography paradigm for perfectly secure covert communication without modifying the cover text. Based on this paradigm, we introduce CLstega (\textit{C}ontent-preserving \textit{L}inguistic \textit{stega}nography), a novel method that embeds secret messages through controllable distribution transformation. CLstega first applies an augmented masking strategy to locate and mask embedding positions, where MLM(masked language model)-predicted probability distributions are easily adjustable for transformation. Subsequently, a dynamic distribution steganographic coding strategy is designed to encode secret messages by deriving target distributions from the original probability distributions. To achieve this transformation, CLstega elaborately selects target words for embedding positions as labels to construct a masked sentence dataset, which is used to fine-tune the original MLM, producing a target MLM capable of directly extracting secret messages from the cover text. This approach ensures perfect security of secret messages while fully preserving the integrity of the original cover text. Experimental results show that CLstega can achieve a 100\% extraction success rate, and outperforms existing methods in security, effectively balancing embedding capacity and security.
Related papers
- STEAD: Robust Provably Secure Linguistic Steganography with Diffusion Language Model [71.35577462669856]
We propose a robust, provably secure linguistic steganography with diffusion language models (DLMs)<n>We introduce error correction strategies, including pseudo-random error correction and neighborhood search correction, during steganographic extraction.
arXiv Detail & Related papers (2026-01-21T08:58:12Z) - A high-capacity linguistic steganography based on entropy-driven rank-token mapping [81.29800498695899]
Linguistic steganography enables covert communication through embedding secret messages into innocuous texts.<n>Traditional modification-based methods introduce detectable anomalies, while retrieval-based strategies suffer from low embedding capacity.<n>We propose an entropy-driven framework called RTMStega that integrates rank-based adaptive coding and context-aware decompression with normalized entropy.
arXiv Detail & Related papers (2025-10-27T06:02:47Z) - The Double-edged Sword of LLM-based Data Reconstruction: Understanding and Mitigating Contextual Vulnerability in Word-level Differential Privacy Text Sanitization [53.51921540246166]
We show that Language Large Models (LLMs) can exploit the contextual vulnerability of DP-sanitized texts.<n>Experiments uncover a double-edged sword effect of LLM reconstructions on privacy and utility.<n>We propose recommendations for using data reconstruction as a post-processing step.
arXiv Detail & Related papers (2025-08-26T12:22:45Z) - MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval [50.062817677022586]
Zero-Shot Image Retrieval (ZS-CIR) methods typically train adapters that convert reference images into pseudo-text tokens.<n>We propose MLLM-Guided VLM Fine-Tuning with Joint Inference (MVFT-JI) to construct two complementary training tasks using only unlabeled images.
arXiv Detail & Related papers (2025-05-26T08:56:59Z) - Robust Steganography from Large Language Models [1.5749416770494704]
We study the problem of robust steganography.<n>We design and implement our steganographic schemes that embed arbitrary secret messages into natural language text.
arXiv Detail & Related papers (2025-04-11T21:06:36Z) - Shifting-Merging: Secure, High-Capacity and Efficient Steganography via Large Language Models [25.52890764952079]
steganography offers a way to securely hide messages within innocent-looking texts.<n>Large Language Models (LLMs) provide high-quality and explicit distribution.<n>ShiMer pseudorandomly shifts the probability interval of the LLM's distribution to obtain a private distribution.
arXiv Detail & Related papers (2025-01-01T09:51:15Z) - Knowing Where to Focus: Attention-Guided Alignment for Text-based Person Search [64.15205542003056]
We introduce Attention-Guided Alignment (AGA) framework featuring two innovative components: Attention-Guided Mask (AGM) Modeling and Text Enrichment Module (TEM)<n>AGA achieves new state-of-the-art results with Rank-1 accuracy reaching 78.36%, 67.31%, and 67.4% on CUHK-PEDES, ICFG-PEDES, and RSTP, respectively.
arXiv Detail & Related papers (2024-12-19T17:51:49Z) - Effective Message Hiding with Order-Preserving Mechanisms [0.0]
StegaFormer is a framework designed to preserve bit order and enable global fusion between modalities.<n>StegaFormer surpasses existing state-of-the-art methods in terms of recovery accuracy, message capacity, and imperceptibility.
arXiv Detail & Related papers (2024-02-29T13:44:19Z) - Silent Guardian: Protecting Text from Malicious Exploitation by Large Language Models [63.91178922306669]
We introduce Silent Guardian, a text protection mechanism against large language models (LLMs)
By carefully modifying the text to be protected, TPE can induce LLMs to first sample the end token, thus directly terminating the interaction.
We show that SG can effectively protect the target text under various configurations and achieve almost 100% protection success rate in some cases.
arXiv Detail & Related papers (2023-12-15T10:30:36Z) - Towards Codable Watermarking for Injecting Multi-bits Information to LLMs [86.86436777626959]
Large language models (LLMs) generate texts with increasing fluency and realism.
Existing watermarking methods are encoding-inefficient and cannot flexibly meet the diverse information encoding needs.
We propose Codable Text Watermarking for LLMs (CTWL) that allows text watermarks to carry multi-bit customizable information.
arXiv Detail & Related papers (2023-07-29T14:11:15Z) - General Framework for Reversible Data Hiding in Texts Based on Masked
Language Modeling [15.136429369639686]
We propose a general framework to embed secret information into a given cover text.
The embedded information and the original cover text can be perfectly retrieved from the marked text.
Our results show that the original cover text and the secret information can be successfully embedded and extracted.
arXiv Detail & Related papers (2022-06-21T05:02:49Z) - Autoregressive Linguistic Steganography Based on BERT and Consistency
Coding [17.881686153284267]
Linguistic steganography (LS) conceals the presence of communication by embedding secret information into a text.
Recent algorithms use a language model (LM) to generate the steganographic text, which provides a higher payload compared with many previous arts.
We propose a novel autoregressive LS algorithm based on BERT and consistency coding, which achieves a better trade-off between embedding payload and system security.
arXiv Detail & Related papers (2022-03-26T02:36:55Z) - Near-imperceptible Neural Linguistic Steganography via Self-Adjusting
Arithmetic Coding [88.31226340759892]
We present a new linguistic steganography method which encodes secret messages using self-adjusting arithmetic coding based on a neural language model.
Human evaluations show that 51% of generated cover texts can indeed fool eavesdroppers.
arXiv Detail & Related papers (2020-10-01T20:40:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.