Frustratingly Easy Edit-based Linguistic Steganography with a Masked
Language Model
- URL: http://arxiv.org/abs/2104.09833v1
- Date: Tue, 20 Apr 2021 08:35:53 GMT
- Title: Frustratingly Easy Edit-based Linguistic Steganography with a Masked
Language Model
- Authors: Honai Ueoka, Yugo Murawaki and Sadao Kurohashi
- Abstract summary: We revisit edit-based linguistic steganography, with the idea that a masked language model offers an off-the-shelf solution.
The proposed method eliminates rule construction and has a high payload capacity for an edit-based model.
It is also shown to be more secure against automatic detection than a generation-based method while offering better control of the security/payload capacity trade-off.
- Score: 21.761511258514673
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With advances in neural language models, the focus of linguistic
steganography has shifted from edit-based approaches to generation-based ones.
While the latter's payload capacity is impressive, generating genuine-looking
texts remains challenging. In this paper, we revisit edit-based linguistic
steganography, with the idea that a masked language model offers an
off-the-shelf solution. The proposed method eliminates painstaking rule
construction and has a high payload capacity for an edit-based model. It is
also shown to be more secure against automatic detection than a
generation-based method while offering better control of the security/payload
capacity trade-off.
Related papers
- Semantic Steganography: A Framework for Robust and High-Capacity Information Hiding using Large Language Models [25.52890764952079]
generative linguistic steganography has become a prevalent technique for hiding information within model-generated texts.
We propose a semantic steganography framework based on Large Language Models (LLMs)
This framework offers robustness and reliability for transmission in complex channels, as well as resistance to text rendering and word blocking.
arXiv Detail & Related papers (2024-12-15T04:04:23Z) - Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis [33.909582975045545]
We propose a phonetic enhanced language modeling method to improve the performance of TTS models.
We leverage self-supervised representations that are phonetically rich as the training target for the autoregressive language model.
arXiv Detail & Related papers (2024-06-04T06:43:34Z) - Contrastive Perplexity for Controlled Generation: An Application in
Detoxifying Large Language Models [25.212449683397647]
This paper studies the integration of a contrastive learning objective for fine-tuning LLMs for implicit knowledge editing and controlled text generation.
To facilitate training the model in a self-supervised fashion, we leverage an off-the-shelf LLM for training data generation.
arXiv Detail & Related papers (2024-01-16T16:49:39Z) - NeuroCounterfactuals: Beyond Minimal-Edit Counterfactuals for Richer
Data Augmentation [55.17069935305069]
We introduce NeuroCounterfactuals, designed as loose counterfactuals, allowing for larger edits which result in naturalistic generations containing linguistic diversity.
Our novel generative approach bridges the benefits of constrained decoding, with those of language model adaptation for sentiment steering.
arXiv Detail & Related papers (2022-10-22T06:29:21Z) - Bridging the Gap Between Training and Inference of Bayesian Controllable
Language Models [58.990214815032495]
Large-scale pre-trained language models have achieved great success on natural language generation tasks.
BCLMs have been shown to be efficient in controllable language generation.
We propose a "Gemini Discriminator" for controllable language generation which alleviates the mismatch problem with a small computational cost.
arXiv Detail & Related papers (2022-06-11T12:52:32Z) - Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction.
It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition.
We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z) - Correcting Automated and Manual Speech Transcription Errors using Warped
Language Models [2.8614709576106874]
We propose a novel approach that takes advantage of the robustness of warped language models to transcription noise for correcting transcriptions of spoken language.
We show that our proposed approach is able to achieve up to 10% reduction in word error rates of both automatic and manual transcriptions of spoken language.
arXiv Detail & Related papers (2021-03-26T16:43:23Z) - GTAE: Graph-Transformer based Auto-Encoders for Linguistic-Constrained
Text Style Transfer [119.70961704127157]
Non-parallel text style transfer has attracted increasing research interests in recent years.
Current approaches still lack the ability to preserve the content and even logic of original sentences.
We propose a method called Graph Transformer based Auto-GTAE, which models a sentence as a linguistic graph and performs feature extraction and style transfer at the graph level.
arXiv Detail & Related papers (2021-02-01T11:08:45Z) - Improving Adversarial Text Generation by Modeling the Distant Future [155.83051741029732]
We consider a text planning scheme and present a model-based imitation-learning approach to alleviate the aforementioned issues.
We propose a novel guider network to focus on the generative process over a longer horizon, which can assist next-word prediction and provide intermediate rewards for generator optimization.
arXiv Detail & Related papers (2020-05-04T05:45:13Z) - Limits of Detecting Text Generated by Large-Scale Language Models [65.46403462928319]
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns.
Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated.
arXiv Detail & Related papers (2020-02-09T19:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.