LLsM: Generative Linguistic Steganography with Large Language Model
- URL: http://arxiv.org/abs/2401.15656v3
- Date: Mon, 8 Apr 2024 03:50:39 GMT
- Title: LLsM: Generative Linguistic Steganography with Large Language Model
- Authors: Yihao Wang, Ruiqi Song, Ru Zhang, Jianyi Liu, Lingxiao Li,
- Abstract summary: Linguistic Steganography (LS) tasks aim to generate steganographic text (stego) based on secret information.
Existing LS methods do not consider the controllable generation of stegos containing specific discourses.
This paper proposes the LLsM, the first LS work with the Large Language Model (LLM)
- Score: 10.72286166021398
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Linguistic Steganography (LS) tasks aim to generate steganographic text (stego) based on secret information. Only authorized recipients can perceive the existence of the stegos and extract secrets, thereby preserving privacy. However, existing LS methods do not consider the controllable generation of stegos containing specific discourses such as style, genre, and theme. And they are difficult to simulate high-quality natural texts. As a result, the stegos are easily perceived and detectable, compromising covert communication. This paper proposes the LLsM, the first LS work with the Large Language Model (LLM). Regarding open-source LLMs, we reconstruct the token generator of LLM to the "stego generator" so that it can control the generation of stego based on the secret. In this "stego generator", the candidate pool is encoded by range coding, and the adjustment factor for the interval length is also given. The secret determines the interval, thereby determining the next token. This better simulates the distribution of natural texts and controls the adjustment of the embedding rate. In addition, we preliminarily built an LLsM-c architecture for closed-source LLMs. It encodes discourse to obtain high-quality prompts containing discourse based on secrets, and generates pure natural texts containing discourse. Experiments show that LLsM performs superior to prevalent LS and related-task baselines regarding various kinds of concealment and anti-steganalysis. LLsM's MAUVE surpasses baselines by 60%-80% and anti-steganalysis exceeds baselines by 20%-30%. Notably, LLsM can also generate longer stegos with high quality, showing its advantages in understanding and coherence.
Related papers
- IGD: Token Decisiveness Modeling via Information Gain in LLMs for Personalized Recommendation [70.2753541780788]
We introduce an Information Gain-based Decisiveness-aware Token handling (IGD) strategy that integrates token decisiveness into both tuning and decoding.<n>IGD consistently improves recommendation accuracy, achieving significant gains on widely used ranking metrics compared to strong baselines.
arXiv Detail & Related papers (2025-06-16T08:28:19Z) - Generating Long Semantic IDs in Parallel for Recommendation [29.97624755406803]
We propose RPG, a lightweight framework for semantic ID-based recommendation.<n>We train the model to predict each token independently using a multi-token prediction loss.<n> Experiments show that scaling up semantic ID length to 64 enables RPG to outperform generative baselines.
arXiv Detail & Related papers (2025-06-06T06:20:37Z) - FIRP: Faster LLM inference via future intermediate representation prediction [54.897493351694195]
FIRP generates multiple tokens instead of one at each decoding step.
We conduct extensive experiments, showing a speedup ratio of 1.9x-3x in several models and datasets.
arXiv Detail & Related papers (2024-10-27T15:53:49Z) - Contextualized Sequence Likelihood: Enhanced Confidence Scores for Natural Language Generation [37.63939774027709]
We propose enhancing the predicted sequence probability by assigning different weights to various tokens.
We refer to this new score as the Contextualized Sequence Likelihood (CSL)
arXiv Detail & Related papers (2024-06-03T21:55:07Z) - Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - Provably Secure Disambiguating Neural Linguistic Steganography [66.30965740387047]
The segmentation ambiguity problem, which arises when using language models based on subwords, leads to occasional decoding failures.
We propose a novel secure disambiguation method named SyncPool, which effectively addresses the segmentation ambiguity problem.
SyncPool does not change the size of the candidate pool or the distribution of tokens and thus is applicable to provably secure language steganography methods.
arXiv Detail & Related papers (2024-03-26T09:25:57Z) - GEC-DePenD: Non-Autoregressive Grammatical Error Correction with
Decoupled Permutation and Decoding [52.14832976759585]
Grammatical error correction (GEC) is an important NLP task that is usually solved with autoregressive sequence-to-sequence models.
We propose a novel non-autoregressive approach to GEC that decouples the architecture into a permutation network.
We show that the resulting network improves over previously known non-autoregressive methods for GEC.
arXiv Detail & Related papers (2023-11-14T14:24:36Z) - Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic
Segmentation [59.37587762543934]
This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS)
Existing methods suffer from a granularity inconsistency regarding the usage of group tokens.
We propose the prototypical guidance network (PGSeg) that incorporates multi-modal regularization.
arXiv Detail & Related papers (2023-10-29T13:18:00Z) - Uncertainty-aware Self-training for Low-resource Neural Sequence
Labeling [29.744621356187764]
This paper presents SeqUST, a novel uncertain-aware self-training framework for Neural sequence labeling (NSL)
We incorporate Monte Carlo (MC) dropout in Bayesian neural network (BNN) to perform uncertainty estimation at the token level and then select reliable language tokens from unlabeled data.
A well-designed masked sequence labeling task with a noise-robust loss supports robust training, which aims to suppress the problem of noisy pseudo labels.
arXiv Detail & Related papers (2023-02-17T02:40:04Z) - Hiding Images in Deep Probabilistic Models [58.23127414572098]
We describe a different computational framework to hide images in deep probabilistic models.
Specifically, we use a DNN to model the probability density of cover images, and hide a secret image in one particular location of the learned distribution.
We demonstrate the feasibility of our SinGAN approach in terms of extraction accuracy and model security.
arXiv Detail & Related papers (2022-10-05T13:33:25Z) - Beyond the Prototype: Divide-and-conquer Proxies for Few-shot
Segmentation [63.910211095033596]
Few-shot segmentation aims to segment unseen-class objects given only a handful of densely labeled samples.
We propose a simple yet versatile framework in the spirit of divide-and-conquer.
Our proposed approach, named divide-and-conquer proxies (DCP), allows for the development of appropriate and reliable information.
arXiv Detail & Related papers (2022-04-21T06:21:14Z) - BLISS: Robust Sequence-to-Sequence Learning via Self-Supervised Input
Representation [92.75908003533736]
We propose a framework-level robust sequence-to-sequence learning approach, named BLISS, via self-supervised input representation.
We conduct comprehensive experiments to validate the effectiveness of BLISS on various tasks, including machine translation, grammatical error correction, and text summarization.
arXiv Detail & Related papers (2022-04-16T16:19:47Z) - Pyramid-BERT: Reducing Complexity via Successive Core-set based Token
Selection [23.39962989492527]
Transformer-based language models such as BERT have achieved the state-of-the-art on various NLP tasks, but are computationally prohibitive.
We present Pyramid-BERT where we replace previously useds with a em core-set based token selection method justified by theoretical results.
The core-set based token selection technique allows us to avoid expensive pre-training, gives a space-efficient fine tuning, and thus makes it suitable to handle longer sequence lengths.
arXiv Detail & Related papers (2022-03-27T19:52:01Z) - Autoregressive Linguistic Steganography Based on BERT and Consistency
Coding [17.881686153284267]
Linguistic steganography (LS) conceals the presence of communication by embedding secret information into a text.
Recent algorithms use a language model (LM) to generate the steganographic text, which provides a higher payload compared with many previous arts.
We propose a novel autoregressive LS algorithm based on BERT and consistency coding, which achieves a better trade-off between embedding payload and system security.
arXiv Detail & Related papers (2022-03-26T02:36:55Z) - A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D
Skeleton Based Person Re-Identification [65.18004601366066]
Person re-identification (Re-ID) via gait features within 3D skeleton sequences is a newly-emerging topic with several advantages.
This paper proposes a self-supervised gait encoding approach that can leverage unlabeled skeleton data to learn gait representations for person Re-ID.
arXiv Detail & Related papers (2020-09-05T16:06:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.