Frustratingly Easy Edit-based Linguistic Steganography with a Masked
Language Model
- URL: http://arxiv.org/abs/2104.09833v1
- Date: Tue, 20 Apr 2021 08:35:53 GMT
- Title: Frustratingly Easy Edit-based Linguistic Steganography with a Masked
Language Model
- Authors: Honai Ueoka, Yugo Murawaki and Sadao Kurohashi
- Abstract summary: We revisit edit-based linguistic steganography, with the idea that a masked language model offers an off-the-shelf solution.
The proposed method eliminates rule construction and has a high payload capacity for an edit-based model.
It is also shown to be more secure against automatic detection than a generation-based method while offering better control of the security/payload capacity trade-off.
- Score: 21.761511258514673
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With advances in neural language models, the focus of linguistic
steganography has shifted from edit-based approaches to generation-based ones.
While the latter's payload capacity is impressive, generating genuine-looking
texts remains challenging. In this paper, we revisit edit-based linguistic
steganography, with the idea that a masked language model offers an
off-the-shelf solution. The proposed method eliminates painstaking rule
construction and has a high payload capacity for an edit-based model. It is
also shown to be more secure against automatic detection than a
generation-based method while offering better control of the security/payload
capacity trade-off.
Related papers
- Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis [33.909582975045545]
We propose a phonetic enhanced language modeling method to improve the performance of TTS models.
We leverage self-supervised representations that are phonetically rich as the training target for the autoregressive language model.
arXiv Detail & Related papers (2024-06-04T06:43:34Z) - Contrastive Perplexity for Controlled Generation: An Application in
Detoxifying Large Language Models [25.212449683397647]
This paper studies the integration of a contrastive learning objective for fine-tuning LLMs for implicit knowledge editing and controlled text generation.
To facilitate training the model in a self-supervised fashion, we leverage an off-the-shelf LLM for training data generation.
arXiv Detail & Related papers (2024-01-16T16:49:39Z) - Few-Shot Detection of Machine-Generated Text using Style Representations [4.326503887981912]
Language models that convincingly mimic human writing pose a significant risk of abuse.
We propose to leverage representations of writing style estimated from human-authored text.
We find that features effective at distinguishing among human authors are also effective at distinguishing human from machine authors.
arXiv Detail & Related papers (2024-01-12T17:26:51Z) - NeuroCounterfactuals: Beyond Minimal-Edit Counterfactuals for Richer
Data Augmentation [55.17069935305069]
We introduce NeuroCounterfactuals, designed as loose counterfactuals, allowing for larger edits which result in naturalistic generations containing linguistic diversity.
Our novel generative approach bridges the benefits of constrained decoding, with those of language model adaptation for sentiment steering.
arXiv Detail & Related papers (2022-10-22T06:29:21Z) - Bridging the Gap Between Training and Inference of Bayesian Controllable
Language Models [58.990214815032495]
Large-scale pre-trained language models have achieved great success on natural language generation tasks.
BCLMs have been shown to be efficient in controllable language generation.
We propose a "Gemini Discriminator" for controllable language generation which alleviates the mismatch problem with a small computational cost.
arXiv Detail & Related papers (2022-06-11T12:52:32Z) - Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction.
It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition.
We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z) - Correcting Automated and Manual Speech Transcription Errors using Warped
Language Models [2.8614709576106874]
We propose a novel approach that takes advantage of the robustness of warped language models to transcription noise for correcting transcriptions of spoken language.
We show that our proposed approach is able to achieve up to 10% reduction in word error rates of both automatic and manual transcriptions of spoken language.
arXiv Detail & Related papers (2021-03-26T16:43:23Z) - GTAE: Graph-Transformer based Auto-Encoders for Linguistic-Constrained
Text Style Transfer [119.70961704127157]
Non-parallel text style transfer has attracted increasing research interests in recent years.
Current approaches still lack the ability to preserve the content and even logic of original sentences.
We propose a method called Graph Transformer based Auto-GTAE, which models a sentence as a linguistic graph and performs feature extraction and style transfer at the graph level.
arXiv Detail & Related papers (2021-02-01T11:08:45Z) - Improving Adversarial Text Generation by Modeling the Distant Future [155.83051741029732]
We consider a text planning scheme and present a model-based imitation-learning approach to alleviate the aforementioned issues.
We propose a novel guider network to focus on the generative process over a longer horizon, which can assist next-word prediction and provide intermediate rewards for generator optimization.
arXiv Detail & Related papers (2020-05-04T05:45:13Z) - Limits of Detecting Text Generated by Large-Scale Language Models [65.46403462928319]
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns.
Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated.
arXiv Detail & Related papers (2020-02-09T19:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.