Topic-Preserving Synthetic News Generation: An Adversarial Deep
Reinforcement Learning Approach
- URL: http://arxiv.org/abs/2010.16324v1
- Date: Fri, 30 Oct 2020 15:29:16 GMT
- Title: Topic-Preserving Synthetic News Generation: An Adversarial Deep
Reinforcement Learning Approach
- Authors: Ahmadreza Mosallanezhad, Kai Shu, Huan Liu
- Abstract summary: GPT-2 can generate readable text and can be fine-tuned to generate text for a specific domain.
In this paper, we study the novel problem of topic-preserving synthetic news generation.
We propose a novel deep reinforcement learning-based method to control the output of GPT-2 with respect to a given news topic.
- Score: 40.254715367640635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays, there exist powerful language models such as OpenAI's GPT-2 that
can generate readable text and can be fine-tuned to generate text for a
specific domain. Considering GPT-2, it cannot directly generate synthetic news
with respect to a given topic and the output of the language model cannot be
explicitly controlled. In this paper, we study the novel problem of
topic-preserving synthetic news generation. We propose a novel deep
reinforcement learning-based method to control the output of GPT-2 with respect
to a given news topic. When generating text using GPT-2, by default, the most
probable word is selected from the vocabulary. Instead of selecting the best
word each time from GPT-2's output, an RL agent tries to select words that
optimize the matching of a given topic. In addition, using a fake news detector
as an adversary, we investigate generating realistic news using our proposed
method. In this paper, we consider realistic news as news that cannot be easily
detected by a fake news classifier. Experimental results demonstrate the
effectiveness of the proposed framework on generating topic-preserving news
content than state-of-the-art baselines.
Related papers
- Augmenting text for spoken language understanding with Large Language
Models [13.240782495441275]
We show how to use transcript-semantic parse data (unpaired text) without corresponding speech.
Experiments show that unpaired text from existing and new domains improves performance by 2% and 30% in absolute Exact Match (EM) respectively.
We propose to prompt Large Language Models (LLMs) to generate unpaired text for existing and new domains.
arXiv Detail & Related papers (2023-09-17T22:25:34Z) - TieFake: Title-Text Similarity and Emotion-Aware Fake News Detection [15.386007761649251]
We propose a novel Title-Text similarity and emotion-aware Fake news detection (TieFake) method by jointly modeling the multi-modal context information and the author sentiment.
Specifically, we employ BERT and ResNeSt to learn the representations for text and images, and utilize publisher emotion extractor to capture the author's subjective emotion in the news content.
arXiv Detail & Related papers (2023-04-19T04:47:36Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - Faking Fake News for Real Fake News Detection: Propaganda-loaded
Training Data Generation [105.20743048379387]
We propose a novel framework for generating training examples informed by the known styles and strategies of human-authored propaganda.
Specifically, we perform self-critical sequence training guided by natural language inference to ensure the validity of the generated articles.
Our experimental results show that fake news detectors trained on PropaNews are better at detecting human-written disinformation by 3.62 - 7.69% F1 score on two public datasets.
arXiv Detail & Related papers (2022-03-10T14:24:19Z) - A Plug-and-Play Method for Controlled Text Generation [38.283313068622085]
We present a plug-and-play decoding method for controlled language generation that is so simple and intuitive, it can be described in a single sentence.
Despite the simplicity of this approach, we see it works incredibly well in practice.
arXiv Detail & Related papers (2021-09-20T17:27:03Z) - DeepTitle -- Leveraging BERT to generate Search Engine Optimized
Headlines [0.0]
We showcase how a pre-trained language model can be leveraged to create an abstractive news headline generator for German language.
We incorporate state of the art fine-tuning techniques for abstractive text summarization, i.e. we use different baits for the encoder and decoder.
We conduct experiments on a German news data set and achieve a ROUGE-L-gram F-score of 40.02.
arXiv Detail & Related papers (2021-07-22T21:32:54Z) - Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes.
An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences.
The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z) - Viable Threat on News Reading: Generating Biased News Using Natural
Language Models [49.90665530780664]
We show that publicly available language models can reliably generate biased news content based on an input original news.
We also show that a large number of high-quality biased news articles can be generated using controllable text generation.
arXiv Detail & Related papers (2020-10-05T16:55:39Z) - Adversarial Watermarking Transformer: Towards Tracing Text Provenance
with Data Hiding [80.3811072650087]
We study natural language watermarking as a defense to help better mark and trace the provenance of text.
We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training.
AWT is the first end-to-end model to hide data in text by automatically learning -- without ground truth -- word substitutions along with their locations.
arXiv Detail & Related papers (2020-09-07T11:01:24Z) - Assessing Discourse Relations in Language Generation from GPT-2 [37.30382375828105]
GPT-2 is suited for generation tasks given its left-to-right language modeling objective.
We study the validity of explicit discourse relations in GPT-2's outputs under both organic generation and fine-tuned scenarios.
arXiv Detail & Related papers (2020-04-26T23:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.