Dynamically Allocated Interval-Based Generative Linguistic Steganography with Roulette Wheel
- URL: http://arxiv.org/abs/2401.15656v5
- Date: Fri, 28 Mar 2025 02:24:55 GMT
- Title: Dynamically Allocated Interval-Based Generative Linguistic Steganography with Roulette Wheel
- Authors: Yihao Wang, Ruiqi Song, Lingxiao Li, Ru Zhang, Jianyi Liu,
- Abstract summary: Existing linguistic steganography schemes often overlook the conditional probability (CP) of tokens in the candidate pool.<n>This paper proposes a scheme based on the interval allocated, called DAIRstega.
- Score: 10.72286166021398
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing linguistic steganography schemes often overlook the conditional probability (CP) of tokens in the candidate pool, allocating the one coding to all tokens, which results in identical selection likelihoods. This approach leads to the selection of low-CP tokens, degrading the quality of stegos and making them more detectable. This paper proposes a scheme based on the interval allocated, called DAIRstega. DAIRstega first uses a portion of the read secret to build the roulette area. Then, this scheme uses the idea of the roulette wheel and takes the CPs of tokens as the main basis for allocating the roulette area (i.e., the interval length). Thus, tokens with larger CPs are allocated more area. The secret will have an increased likelihood of selecting a token with a higher CP. During allocation, we designed some allocation functions and three constraints to optimize the process. Additionally, DAIRstega supports prompt-based controllable generation of stegos. Rich experiments show that the proposed embedding way and DAIRstega perform better than the existing ways and baselines, which shows strong perceptual, statistical, and semantic concealment, as well as anti-steganalysis ability. It can also generate high-quality longer stegos, addressing the deficiencies in this task. DAIRstega is confirmed to have potential as a secure watermarking, offering insights for its development.
Related papers
- Coin selection by Random Draw according to the Boltzmann distribution [0.13048920509133805]
We propose the Boltzmann Draw that is a probabilistic algorithm inspired by the principles of statistical physics.<n>The algorithm relies on drawing tokens according to the Boltzmann distribution, serving as an extension and improvement of the Random Draw method.<n> Numerical results demonstrate the effectiveness of our method in bounding the number of selected input tokens as well as reducing dust generation and limiting the token pool size in the wallet.
arXiv Detail & Related papers (2026-02-19T16:00:21Z) - Beyond Imitation: Reinforcement Learning for Active Latent Planning [18.05072303874982]
latent reasoning methods fine-tune Large Language Models to substitute discrete language tokens with continuous latent tokens.<n>Current latent tokens are generally supervised based on imitating language labels.<n>We propose ATP-Latent to model the supervision process of latent tokens as a conditional variational auto-encoder.
arXiv Detail & Related papers (2026-01-29T12:07:16Z) - LMK > CLS: Landmark Pooling for Dense Embeddings [18.49372789918725]
We introduce Landmark (LMK) pooling, which partitions a sequence into chunks, inserts landmark tokens between chunks, and forms the final representation by mean-pooling the landmark token embeddings.<n>This simple mechanism improves long-context extrapolation without sacrificing local salient features, at the cost of introducing a small number of special tokens.
arXiv Detail & Related papers (2026-01-29T10:40:37Z) - SCOPE: Saliency-Coverage Oriented Token Pruning for Efficient Multimodel LLMs [59.415473779171315]
We propose a novel visual token pruning strategy called textbfSaliency-textbfCoverage textbfOriented token textbfPruning for textbfEfficient MLLMs.
arXiv Detail & Related papers (2025-10-28T09:29:37Z) - Exploiting Discriminative Codebook Prior for Autoregressive Image Generation [54.14166700058777]
token-based autoregressive image generation systems first tokenize images into sequences of token indices with a codebook, and then model these sequences in an autoregressive paradigm.<n>While autoregressive generative models are trained only on index values, the prior encoded in the codebook, which contains rich token similarity information, is not exploited.<n>Recent studies have attempted to incorporate this prior by performing naive k-means clustering on the tokens, helping to facilitate the training of generative models with a reduced codebook.<n>We propose the Discriminative Codebook Prior Extractor (DCPE) as an alternative to k-means
arXiv Detail & Related papers (2025-08-14T15:00:00Z) - IGD: Token Decisiveness Modeling via Information Gain in LLMs for Personalized Recommendation [70.2753541780788]
We introduce an Information Gain-based Decisiveness-aware Token handling (IGD) strategy that integrates token decisiveness into both tuning and decoding.<n>IGD consistently improves recommendation accuracy, achieving significant gains on widely used ranking metrics compared to strong baselines.
arXiv Detail & Related papers (2025-06-16T08:28:19Z) - Generating Long Semantic IDs in Parallel for Recommendation [29.97624755406803]
We propose RPG, a lightweight framework for semantic ID-based recommendation.<n>We train the model to predict each token independently using a multi-token prediction loss.<n> Experiments show that scaling up semantic ID length to 64 enables RPG to outperform generative baselines.
arXiv Detail & Related papers (2025-06-06T06:20:37Z) - FIRP: Faster LLM inference via future intermediate representation prediction [54.897493351694195]
FIRP generates multiple tokens instead of one at each decoding step.
We conduct extensive experiments, showing a speedup ratio of 1.9x-3x in several models and datasets.
arXiv Detail & Related papers (2024-10-27T15:53:49Z) - Contextualized Sequence Likelihood: Enhanced Confidence Scores for Natural Language Generation [37.63939774027709]
We propose enhancing the predicted sequence probability by assigning different weights to various tokens.
We refer to this new score as the Contextualized Sequence Likelihood (CSL)
arXiv Detail & Related papers (2024-06-03T21:55:07Z) - Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - Provably Secure Disambiguating Neural Linguistic Steganography [66.30965740387047]
The segmentation ambiguity problem, which arises when using language models based on subwords, leads to occasional decoding failures.
We propose a novel secure disambiguation method named SyncPool, which effectively addresses the segmentation ambiguity problem.
SyncPool does not change the size of the candidate pool or the distribution of tokens and thus is applicable to provably secure language steganography methods.
arXiv Detail & Related papers (2024-03-26T09:25:57Z) - GEC-DePenD: Non-Autoregressive Grammatical Error Correction with
Decoupled Permutation and Decoding [52.14832976759585]
Grammatical error correction (GEC) is an important NLP task that is usually solved with autoregressive sequence-to-sequence models.
We propose a novel non-autoregressive approach to GEC that decouples the architecture into a permutation network.
We show that the resulting network improves over previously known non-autoregressive methods for GEC.
arXiv Detail & Related papers (2023-11-14T14:24:36Z) - Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic
Segmentation [59.37587762543934]
This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS)
Existing methods suffer from a granularity inconsistency regarding the usage of group tokens.
We propose the prototypical guidance network (PGSeg) that incorporates multi-modal regularization.
arXiv Detail & Related papers (2023-10-29T13:18:00Z) - Uncertainty-aware Self-training for Low-resource Neural Sequence
Labeling [29.744621356187764]
This paper presents SeqUST, a novel uncertain-aware self-training framework for Neural sequence labeling (NSL)
We incorporate Monte Carlo (MC) dropout in Bayesian neural network (BNN) to perform uncertainty estimation at the token level and then select reliable language tokens from unlabeled data.
A well-designed masked sequence labeling task with a noise-robust loss supports robust training, which aims to suppress the problem of noisy pseudo labels.
arXiv Detail & Related papers (2023-02-17T02:40:04Z) - Hiding Images in Deep Probabilistic Models [58.23127414572098]
We describe a different computational framework to hide images in deep probabilistic models.
Specifically, we use a DNN to model the probability density of cover images, and hide a secret image in one particular location of the learned distribution.
We demonstrate the feasibility of our SinGAN approach in terms of extraction accuracy and model security.
arXiv Detail & Related papers (2022-10-05T13:33:25Z) - Beyond the Prototype: Divide-and-conquer Proxies for Few-shot
Segmentation [63.910211095033596]
Few-shot segmentation aims to segment unseen-class objects given only a handful of densely labeled samples.
We propose a simple yet versatile framework in the spirit of divide-and-conquer.
Our proposed approach, named divide-and-conquer proxies (DCP), allows for the development of appropriate and reliable information.
arXiv Detail & Related papers (2022-04-21T06:21:14Z) - BLISS: Robust Sequence-to-Sequence Learning via Self-Supervised Input
Representation [92.75908003533736]
We propose a framework-level robust sequence-to-sequence learning approach, named BLISS, via self-supervised input representation.
We conduct comprehensive experiments to validate the effectiveness of BLISS on various tasks, including machine translation, grammatical error correction, and text summarization.
arXiv Detail & Related papers (2022-04-16T16:19:47Z) - Pyramid-BERT: Reducing Complexity via Successive Core-set based Token
Selection [23.39962989492527]
Transformer-based language models such as BERT have achieved the state-of-the-art on various NLP tasks, but are computationally prohibitive.
We present Pyramid-BERT where we replace previously useds with a em core-set based token selection method justified by theoretical results.
The core-set based token selection technique allows us to avoid expensive pre-training, gives a space-efficient fine tuning, and thus makes it suitable to handle longer sequence lengths.
arXiv Detail & Related papers (2022-03-27T19:52:01Z) - Autoregressive Linguistic Steganography Based on BERT and Consistency
Coding [17.881686153284267]
Linguistic steganography (LS) conceals the presence of communication by embedding secret information into a text.
Recent algorithms use a language model (LM) to generate the steganographic text, which provides a higher payload compared with many previous arts.
We propose a novel autoregressive LS algorithm based on BERT and consistency coding, which achieves a better trade-off between embedding payload and system security.
arXiv Detail & Related papers (2022-03-26T02:36:55Z) - A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D
Skeleton Based Person Re-Identification [65.18004601366066]
Person re-identification (Re-ID) via gait features within 3D skeleton sequences is a newly-emerging topic with several advantages.
This paper proposes a self-supervised gait encoding approach that can leverage unlabeled skeleton data to learn gait representations for person Re-ID.
arXiv Detail & Related papers (2020-09-05T16:06:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.