General Framework for Reversible Data Hiding in Texts Based on Masked
Language Modeling
- URL: http://arxiv.org/abs/2206.10112v1
- Date: Tue, 21 Jun 2022 05:02:49 GMT
- Title: General Framework for Reversible Data Hiding in Texts Based on Masked
Language Modeling
- Authors: Xiaoyan Zheng, Yurun Fang and Hanzhou Wu
- Abstract summary: We propose a general framework to embed secret information into a given cover text.
The embedded information and the original cover text can be perfectly retrieved from the marked text.
Our results show that the original cover text and the secret information can be successfully embedded and extracted.
- Score: 15.136429369639686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the fast development of natural language processing, recent advances in
information hiding focus on covertly embedding secret information into texts.
These algorithms either modify a given cover text or directly generate a text
containing secret information, which, however, are not reversible, meaning that
the original text not carrying secret information cannot be perfectly recovered
unless much side information are shared in advance. To tackle with this
problem, in this paper, we propose a general framework to embed secret
information into a given cover text, for which the embedded information and the
original cover text can be perfectly retrieved from the marked text. The main
idea of the proposed method is to use a masked language model to generate such
a marked text that the cover text can be reconstructed by collecting the words
of some positions and the words of the other positions can be processed to
extract the secret information. Our results show that the original cover text
and the secret information can be successfully embedded and extracted.
Meanwhile, the marked text carrying secret information has good fluency and
semantic quality, indicating that the proposed method has satisfactory
security, which has been verified by experimental results. Furthermore, there
is no need for the data hider and data receiver to share the language model,
which significantly reduces the side information and thus has good potential in
applications.
Related papers
- TextDestroyer: A Training- and Annotation-Free Diffusion Method for Destroying Anomal Text from Images [84.08181780666698]
TextDestroyer is the first training- and annotation-free method for scene text destruction.
Our method scrambles text areas in the latent start code using a Gaussian distribution before reconstruction.
The advantages of TextDestroyer include: (1) it eliminates labor-intensive data annotation and resource-intensive training; (2) it achieves more thorough text destruction, preventing recognizable traces; and (3) it demonstrates better generalization capabilities, performing well on both real-world scenes and generated images.
arXiv Detail & Related papers (2024-11-01T04:41:00Z) - Generalized Tampered Scene Text Detection in the era of Generative AI [33.38946428507517]
We present open-set tampered scene text detection, which evaluates forensics models on their ability to identify both seen and unseen forgery types.
We introduce a novel and effective pre-training paradigm that subtly alters the texture of selected texts within an image and trains the model to identify these regions.
We also present DAF, a framework that improves open-set generalization by distinguishing between the features of authentic and tampered text.
arXiv Detail & Related papers (2024-07-31T08:17:23Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - Text Sanitization Beyond Specific Domains: Zero-Shot Redaction &
Substitution with Large Language Models [0.0]
We present a zero-shot text sanitization technique that detects and substitutes potentially sensitive information using Large Language Models.
Our evaluation shows that our method excels at protecting privacy while maintaining text coherence and contextual information.
arXiv Detail & Related papers (2023-11-16T18:42:37Z) - Weakly Supervised Scene Text Generation for Low-resource Languages [19.243705770491577]
A large number of annotated training images is crucial for training successful scene text recognition models.
Existing scene text generation methods typically rely on a large amount of paired data, which is difficult to obtain for low-resource languages.
We propose a novel weakly supervised scene text generation method that leverages a few recognition-level labels as weak supervision.
arXiv Detail & Related papers (2023-06-25T15:26:06Z) - Privacy-Preserving Text Classification on BERT Embeddings with
Homomorphic Encryption [23.010346603025255]
We propose a privatization mechanism for embeddings based on homomorphic encryption.
We show that our method offers encrypted protection of BERT embeddings, while largely preserving their utility on downstream text classification tasks.
arXiv Detail & Related papers (2022-10-05T21:46:02Z) - Autoregressive Linguistic Steganography Based on BERT and Consistency
Coding [17.881686153284267]
Linguistic steganography (LS) conceals the presence of communication by embedding secret information into a text.
Recent algorithms use a language model (LM) to generate the steganographic text, which provides a higher payload compared with many previous arts.
We propose a novel autoregressive LS algorithm based on BERT and consistency coding, which achieves a better trade-off between embedding payload and system security.
arXiv Detail & Related papers (2022-03-26T02:36:55Z) - Near-imperceptible Neural Linguistic Steganography via Self-Adjusting
Arithmetic Coding [88.31226340759892]
We present a new linguistic steganography method which encodes secret messages using self-adjusting arithmetic coding based on a neural language model.
Human evaluations show that 51% of generated cover texts can indeed fool eavesdroppers.
arXiv Detail & Related papers (2020-10-01T20:40:23Z) - Adversarial Watermarking Transformer: Towards Tracing Text Provenance
with Data Hiding [80.3811072650087]
We study natural language watermarking as a defense to help better mark and trace the provenance of text.
We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training.
AWT is the first end-to-end model to hide data in text by automatically learning -- without ground truth -- word substitutions along with their locations.
arXiv Detail & Related papers (2020-09-07T11:01:24Z) - Improving Disentangled Text Representation Learning with
Information-Theoretic Guidance [99.68851329919858]
discrete nature of natural language makes disentangling of textual representations more challenging.
Inspired by information theory, we propose a novel method that effectively manifests disentangled representations of text.
Experiments on both conditional text generation and text-style transfer demonstrate the high quality of our disentangled representation.
arXiv Detail & Related papers (2020-06-01T03:36:01Z) - TRIE: End-to-End Text Reading and Information Extraction for Document
Understanding [56.1416883796342]
We propose a unified end-to-end text reading and information extraction network.
multimodal visual and textual features of text reading are fused for information extraction.
Our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy.
arXiv Detail & Related papers (2020-05-27T01:47:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.