Plan ahead: Self-Supervised Text Planning for Paragraph Completion Task
- URL: http://arxiv.org/abs/2010.05141v1
- Date: Sun, 11 Oct 2020 02:38:21 GMT
- Title: Plan ahead: Self-Supervised Text Planning for Paragraph Completion Task
- Authors: Dongyeop Kang, Eduard Hovy
- Abstract summary: We propose a self-supervised text planner SSPlanner that predicts what to say first.
It then guides the pretrained language model (surface realization) using the predicted content.
We also find that a combination of noun and verb types of keywords is the most effective for content selection.
- Score: 14.483791451578007
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the recent success of contextualized language models on various NLP
tasks, language model itself cannot capture textual coherence of a long,
multi-sentence document (e.g., a paragraph). Humans often make structural
decisions on what and how to say about before making utterances. Guiding
surface realization with such high-level decisions and structuring text in a
coherent way is essentially called a planning process. Where can the model
learn such high-level coherence? A paragraph itself contains various forms of
inductive coherence signals called self-supervision in this work, such as
sentence orders, topical keywords, rhetorical structures, and so on. Motivated
by that, this work proposes a new paragraph completion task PARCOM; predicting
masked sentences in a paragraph. However, the task suffers from predicting and
selecting appropriate topical content with respect to the given context. To
address that, we propose a self-supervised text planner SSPlanner that predicts
what to say first (content prediction), then guides the pretrained language
model (surface realization) using the predicted content. SSPlanner outperforms
the baseline generation models on the paragraph completion task in both
automatic and human evaluation. We also find that a combination of noun and
verb types of keywords is the most effective for content selection. As more
number of content keywords are provided, overall generation quality also
increases.
Related papers
- EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form
Narrative Text Generation [114.50719922069261]
We propose a new framework called Evaluation-guided Iterative Plan Extraction for long-form narrative text generation (EIPE-text)
EIPE-text has three stages: plan extraction, learning, and inference.
We evaluate the effectiveness of EIPE-text in the domains of novels and storytelling.
arXiv Detail & Related papers (2023-10-12T10:21:37Z) - Leveraging Natural Supervision for Language Representation Learning and
Generation [8.083109555490475]
We describe three lines of work that seek to improve the training and evaluation of neural models using naturally-occurring supervision.
We first investigate self-supervised training losses to help enhance the performance of pretrained language models for various NLP tasks.
We propose a framework that uses paraphrase pairs to disentangle semantics and syntax in sentence representations.
arXiv Detail & Related papers (2022-07-21T17:26:03Z) - Data-to-text Generation with Variational Sequential Planning [74.3955521225497]
We consider the task of data-to-text generation, which aims to create textual output from non-linguistic input.
We propose a neural model enhanced with a planning component responsible for organizing high-level information in a coherent and meaningful way.
We infer latent plans sequentially with a structured variational model, while interleaving the steps of planning and generation.
arXiv Detail & Related papers (2022-02-28T13:17:59Z) - SCROLLS: Standardized CompaRison Over Long Language Sequences [62.574959194373264]
We introduce SCROLLS, a suite of tasks that require reasoning over long texts.
SCROLLS contains summarization, question answering, and natural language inference tasks.
We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.
arXiv Detail & Related papers (2022-01-10T18:47:15Z) - Improving Text Auto-Completion with Next Phrase Prediction [9.385387026783103]
Our strategy includes a novel self-supervised training objective called Next Phrase Prediction (NPP)
Preliminary experiments have shown that our approach is able to outperform the baselines in auto-completion for email and academic writing domains.
arXiv Detail & Related papers (2021-09-15T04:26:15Z) - Long Text Generation by Modeling Sentence-Level and Discourse-Level
Coherence [59.51720326054546]
We propose a long text generation model, which can represent the prefix sentences at sentence level and discourse level in the decoding process.
Our model can generate more coherent texts than state-of-the-art baselines.
arXiv Detail & Related papers (2021-05-19T07:29:08Z) - Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical
Supervision from Extractive Summaries [46.183289748907804]
We propose SOE, a pipelined system that outlines, outlining and elaborating for long text generation.
SOE produces long texts with significantly better quality, along with faster convergence speed.
arXiv Detail & Related papers (2020-10-14T13:22:20Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Toward Better Storylines with Sentence-Level Language Models [54.91921545103256]
We propose a sentence-level language model which selects the next sentence in a story from a finite set of fluent alternatives.
We demonstrate the effectiveness of our approach with state-of-the-art accuracy on the unsupervised Story Cloze task.
arXiv Detail & Related papers (2020-05-11T16:54:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.