MEGATRON-CNTRL: Controllable Story Generation with External Knowledge
Using Large-Scale Language Models
- URL: http://arxiv.org/abs/2010.00840v1
- Date: Fri, 2 Oct 2020 08:07:12 GMT
- Title: MEGATRON-CNTRL: Controllable Story Generation with External Knowledge
Using Large-Scale Language Models
- Authors: Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Raul Puri, Pascale Fung,
Anima Anandkumar and Bryan Catanzaro
- Abstract summary: We propose a novel framework that uses large-scale language models and adds control to text generation by incorporating an external knowledge base.
Our framework consists of a keyword predictor, a knowledge retriever, a contextual knowledge ranker, and a conditional text generator.
The empirical results show that our model generates more fluent, consistent, and coherent stories with less repetition and higher diversity compared to prior work on the ROC story dataset.
- Score: 98.53511229517463
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing pre-trained large language models have shown unparalleled generative
capabilities. However, they are not controllable. In this paper, we propose
MEGATRON-CNTRL, a novel framework that uses large-scale language models and
adds control to text generation by incorporating an external knowledge base.
Our framework consists of a keyword predictor, a knowledge retriever, a
contextual knowledge ranker, and a conditional text generator. As we do not
have access to ground-truth supervision for the knowledge ranker, we make use
of weak supervision from sentence embedding. The empirical results show that
our model generates more fluent, consistent, and coherent stories with less
repetition and higher diversity compared to prior work on the ROC story
dataset. We showcase the controllability of our model by replacing the keywords
used to generate stories and re-running the generation process. Human
evaluation results show that 77.5% of these stories are successfully controlled
by the new keywords. Furthermore, by scaling our model from 124 million to 8.3
billion parameters we demonstrate that larger models improve both the quality
of generation (from 74.5% to 93.0% for consistency) and controllability (from
77.5% to 91.5%).
Related papers
- Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model.
It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model.
It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language
Models [58.42146641102329]
We develop a novel semi-parametric language model architecture, Knowledge-in-Context (KiC)
KiC empowers a parametric text-to-text language model with a knowledge-rich external memory.
As a knowledge-rich semi-parametric language model, KiC only needs a much smaller part to achieve superior zero-shot performance on unseen tasks.
arXiv Detail & Related papers (2022-10-28T23:18:43Z) - MOCHA: A Multi-Task Training Approach for Coherent Text Generation from
Cognitive Perspective [22.69509556890676]
We propose a novel multi-task training strategy for coherent text generation grounded on the cognitive theory of writing.
We extensively evaluate our model on three open-ended generation tasks including story generation, news article writing and argument generation.
arXiv Detail & Related papers (2022-10-26T11:55:41Z) - Robust Preference Learning for Storytelling via Contrastive
Reinforcement Learning [53.92465205531759]
Controlled automated story generation seeks to generate natural language stories satisfying constraints from natural language critiques or preferences.
We train a contrastive bi-encoder model to align stories with human critiques, building a general purpose preference model.
We further fine-tune the contrastive reward model using a prompt-learning technique to increase story generation robustness.
arXiv Detail & Related papers (2022-10-14T13:21:33Z) - ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language
Understanding and Generation [25.430130072811075]
We propose a unified framework named ERNIE 3.0 for pre-training large-scale knowledge enhanced models.
It fuses auto-regressive network and auto-encoding network, so that the trained model can be easily tailored for both natural language understanding and generation tasks.
We trained the model with 10 billion parameters on a 4TB corpus consisting of plain texts and a large-scale knowledge graph.
arXiv Detail & Related papers (2021-07-05T16:54:59Z) - Neural Models for Offensive Language Detection [0.0]
Offensive language detection is an ever-growing natural language processing (NLP) application.
We believe contributing to improving and comparing different machine learning models to fight such harmful contents is an important and challenging goal for this thesis.
arXiv Detail & Related papers (2021-05-30T13:02:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.