The Next Chapter: A Study of Large Language Models in Storytelling
- URL: http://arxiv.org/abs/2301.09790v3
- Date: Mon, 24 Jul 2023 10:03:01 GMT
- Title: The Next Chapter: A Study of Large Language Models in Storytelling
- Authors: Zhuohan Xie, Trevor Cohn, Jey Han Lau
- Abstract summary: The application of prompt-based learning with large language models (LLMs) has exhibited remarkable performance in diverse natural language processing (NLP) tasks.
This paper conducts a comprehensive investigation, utilizing both automatic and human evaluation, to compare the story generation capacity of LLMs with recent models.
The results demonstrate that LLMs generate stories of significantly higher quality compared to other story generation models.
- Score: 51.338324023617034
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To enhance the quality of generated stories, recent story generation models
have been investigating the utilization of higher-level attributes like plots
or commonsense knowledge. The application of prompt-based learning with large
language models (LLMs), exemplified by GPT-3, has exhibited remarkable
performance in diverse natural language processing (NLP) tasks. This paper
conducts a comprehensive investigation, utilizing both automatic and human
evaluation, to compare the story generation capacity of LLMs with recent models
across three datasets with variations in style, register, and length of
stories. The results demonstrate that LLMs generate stories of significantly
higher quality compared to other story generation models. Moreover, they
exhibit a level of performance that competes with human authors, albeit with
the preliminary observation that they tend to replicate real stories in
situations involving world knowledge, resembling a form of plagiarism.
Related papers
- Are Large Language Models Capable of Generating Human-Level Narratives? [114.34140090869175]
This paper investigates the capability of LLMs in storytelling, focusing on narrative development and plot progression.
We introduce a novel computational framework to analyze narratives through three discourse-level aspects.
We show that explicit integration of discourse features can enhance storytelling, as is demonstrated by over 40% improvement in neural storytelling.
arXiv Detail & Related papers (2024-07-18T08:02:49Z) - Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition [8.058451580903123]
We introduce a novel method that measures story quality in terms of human likeness.
We then use this method to evaluate the stories generated by several models.
Upgrading the visual and language components of TAPM results in a model that yields competitive performance.
arXiv Detail & Related papers (2024-07-05T14:48:15Z) - Improving Visual Storytelling with Multimodal Large Language Models [1.325953054381901]
This paper presents a novel approach leveraging large language models (LLMs) and large vision-language models (LVLMs)
We introduce a new dataset comprising diverse visual stories, annotated with detailed captions and multimodal elements.
Our method employs a combination of supervised and reinforcement learning to fine-tune the model, enhancing its narrative generation capabilities.
arXiv Detail & Related papers (2024-07-02T18:13:55Z) - Exploration of Masked and Causal Language Modelling for Text Generation [6.26998839917804]
This paper conducts an extensive comparison of Causal Language Modelling approaches for text generation tasks.
We first employ quantitative metrics and then perform a qualitative human evaluation to analyse coherence and grammatical correctness.
The results show that consistently outperforms CLM in text generation across all datasets.
arXiv Detail & Related papers (2024-05-21T09:33:31Z) - Reading Subtext: Evaluating Large Language Models on Short Story Summarization with Writers [25.268709339109893]
We evaluate recent Large Language Models (LLMs) on the challenging task of summarizing short stories.
We work directly with authors to ensure that the stories have not been shared online (and therefore are unseen by the models)
We compare GPT-4, Claude-2.1, and LLama-2-70B and find that all three models make faithfulness mistakes in over 50% of summaries.
arXiv Detail & Related papers (2024-03-02T01:52:14Z) - A Comparative Analysis of Conversational Large Language Models in
Knowledge-Based Text Generation [5.661396828160973]
We conduct an empirical analysis of conversational large language models in generating natural language text from semantic triples.
We compare four large language models of varying sizes with different prompting techniques.
Our findings show that the capabilities of large language models in triple verbalization can be significantly improved through few-shot prompting, post-processing, and efficient fine-tuning techniques.
arXiv Detail & Related papers (2024-02-02T15:26:39Z) - Recitation-Augmented Language Models [85.30591349383849]
We show that RECITE is a powerful paradigm for knowledge-intensive NLP tasks.
Specifically, we show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance.
arXiv Detail & Related papers (2022-10-04T00:49:20Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - WANLI: Worker and AI Collaboration for Natural Language Inference
Dataset Creation [101.00109827301235]
We introduce a novel paradigm for dataset creation based on human and machine collaboration.
We use dataset cartography to automatically identify examples that demonstrate challenging reasoning patterns, and instruct GPT-3 to compose new examples with similar patterns.
The resulting dataset, WANLI, consists of 108,357 natural language inference (NLI) examples that present unique empirical strengths.
arXiv Detail & Related papers (2022-01-16T03:13:49Z) - STORIUM: A Dataset and Evaluation Platform for Machine-in-the-Loop Story
Generation [48.56586847883825]
We introduce a dataset and evaluation platform built from STORIUM, an online collaborative storytelling community.
Our dataset contains 6K lengthy stories with fine-grained natural language annotations interspersed throughout each narrative.
We evaluate language models fine-tuned on our dataset by integrating them onto STORIUM, where real authors can query a model for suggested story continuations and then edit them.
arXiv Detail & Related papers (2020-10-04T23:26:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.