Semantic-Preserving Linguistic Steganography by Pivot Translation and
Semantic-Aware Bins Coding
- URL: http://arxiv.org/abs/2203.03795v1
- Date: Tue, 8 Mar 2022 01:35:05 GMT
- Title: Semantic-Preserving Linguistic Steganography by Pivot Translation and
Semantic-Aware Bins Coding
- Authors: Tianyu Yang, Hanzhou Wu, Biao Yi, Guorui Feng and Xinpeng Zhang
- Abstract summary: Linguistic steganography (LS) aims to embed secret information into a highly encoded text for covert communication.
We propose a novel LS method to modify a given text by pivoting it between two different languages.
- Score: 45.13432859384438
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Linguistic steganography (LS) aims to embed secret information into a highly
encoded text for covert communication. It can be roughly divided to two main
categories, i.e., modification based LS (MLS) and generation based LS (GLS).
Unlike MLS that hides secret data by slightly modifying a given text without
impairing the meaning of the text, GLS uses a trained language model to
directly generate a text carrying secret data. A common disadvantage for MLS
methods is that the embedding payload is very low, whose return is well
preserving the semantic quality of the text. In contrast, GLS allows the data
hider to embed a high payload, which has to pay the high price of
uncontrollable semantics. In this paper, we propose a novel LS method to modify
a given text by pivoting it between two different languages and embed secret
data by applying a GLS-like information encoding strategy. Our purpose is to
alter the expression of the given text, enabling a high payload to be embedded
while keeping the semantic information unchanged. Experimental results have
shown that the proposed work not only achieves a high embedding payload, but
also shows superior performance in maintaining the semantic consistency and
resisting linguistic steganalysis.
Related papers
- Generative Text Steganography with Large Language Model [10.572149957139736]
Black-box generative text steganographic method based on user interfaces of large language models, which is called LLM-Stega.
We first construct a keyword set and design a new encrypted steganographic mapping to embed secret messages.
Comprehensive experiments demonstrate that the proposed LLM-Stega outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-04-16T02:19:28Z) - MultiLS: A Multi-task Lexical Simplification Framework [21.81108113189197]
We present MultiLS, the first LS framework that allows for the creation of a multi-task LS dataset.
We also present MultiLS-PT, the first dataset to be created using the MultiLS framework.
arXiv Detail & Related papers (2024-02-22T21:16:18Z) - Gloss-free Sign Language Translation: Improving from Visual-Language
Pretraining [56.26550923909137]
Gloss-Free Sign Language Translation (SLT) is a challenging task due to its cross-domain nature.
We propose a novel Gloss-Free SLT based on Visual-Language Pretraining (GFSLT-)
Our approach involves two stages: (i) integrating Contrastive Language-Image Pre-training with masked self-supervised learning to create pre-tasks that bridge the semantic gap between visual and textual representations and restore masked sentences, and (ii) constructing an end-to-end architecture with an encoder-decoder-like structure that inherits the parameters of the pre-trained Visual and Text Decoder from
arXiv Detail & Related papers (2023-07-27T10:59:18Z) - Better Sign Language Translation with Monolingual Data [6.845232643246564]
Sign language translation (SLT) systems heavily relies on the availability of large-scale parallel G2T pairs.
This paper proposes a simple and efficient rule transformation method to transcribe the large-scale target monolingual data into its pseudo glosses automatically.
Empirical results show that the proposed approach can significantly improve the performance of SLT.
arXiv Detail & Related papers (2023-04-21T09:39:54Z) - Understanding Translationese in Cross-Lingual Summarization [106.69566000567598]
Cross-lingual summarization (MS) aims at generating a concise summary in a different target language.
To collect large-scale CLS data, existing datasets typically involve translation in their creation.
In this paper, we first confirm that different approaches of constructing CLS datasets will lead to different degrees of translationese.
arXiv Detail & Related papers (2022-12-14T13:41:49Z) - Exposing Cross-Lingual Lexical Knowledge from Multilingual Sentence
Encoders [85.80950708769923]
We probe multilingual language models for the amount of cross-lingual lexical knowledge stored in their parameters, and compare them against the original multilingual LMs.
We also devise a novel method to expose this knowledge by additionally fine-tuning multilingual models.
We report substantial gains on standard benchmarks.
arXiv Detail & Related papers (2022-04-30T13:23:16Z) - Autoregressive Linguistic Steganography Based on BERT and Consistency
Coding [17.881686153284267]
Linguistic steganography (LS) conceals the presence of communication by embedding secret information into a text.
Recent algorithms use a language model (LM) to generate the steganographic text, which provides a higher payload compared with many previous arts.
We propose a novel autoregressive LS algorithm based on BERT and consistency coding, which achieves a better trade-off between embedding payload and system security.
arXiv Detail & Related papers (2022-03-26T02:36:55Z) - Improving Sign Language Translation with Monolingual Data by Sign
Back-Translation [105.83166521438463]
We propose a sign back-translation (SignBT) approach, which incorporates massive spoken language texts into sign training.
With a text-to-gloss translation model, we first back-translate the monolingual text to its gloss sequence.
Then, the paired sign sequence is generated by splicing pieces from an estimated gloss-to-sign bank at the feature level.
arXiv Detail & Related papers (2021-05-26T08:49:30Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.