Related papers: Evaluating Creative Short Story Generation in Humans and Large Language Models

Evaluating Creative Short Story Generation in Humans and Large Language Models

URL: http://arxiv.org/abs/2411.02316v4
Date: Tue, 04 Mar 2025 09:10:59 GMT
Title: Evaluating Creative Short Story Generation in Humans and Large Language Models
Authors: Mete Ismayilzada, Claire Stevenson, Lonneke van der Plas,
Abstract summary: Large language models (LLMs) have demonstrated the ability to generate high-quality stories, but their creative story-writing capabilities remain under-explored.<n>We conduct a systematic analysis of creativity in short story generation across 60 LLMs and 60 people using a five-sentence creative story-writing task.
Score: 0.7965327033045846
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Story-writing is a fundamental aspect of human imagination, relying heavily on creativity to produce narratives that are novel, effective, and surprising. While large language models (LLMs) have demonstrated the ability to generate high-quality stories, their creative story-writing capabilities remain under-explored. In this work, we conduct a systematic analysis of creativity in short story generation across 60 LLMs and 60 people using a five-sentence creative story-writing task. We use measures to automatically evaluate model- and human-generated stories across several dimensions of creativity, including novelty, surprise, diversity, and linguistic complexity. We also collect creativity ratings and Turing Test classifications from non-expert and expert human raters and LLMs. Automated metrics show that LLMs generate stylistically complex stories, but tend to fall short in terms of novelty, surprise and diversity when compared to average human writers. Expert ratings generally coincide with automated metrics. However, LLMs and non-experts rate LLM stories to be more creative than human-generated stories. We discuss why and how these differences in ratings occur, and their implications for both human and artificial creativity.

Related papers

Echoes in AI: Quantifying Lack of Plot Diversity in LLM Outputs [26.682827310724363]
We examine two state-of-the-art large language models (LLMs) on story generation. We find that LLM-generated stories often consist of plot elements that are echoed across a number of generations. We introduce the Sui Generis score, which estimates how unlikely a plot element is to appear in alternative storylines.
arXiv Detail & Related papers (2024-12-31T04:54:48Z)
AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text [53.15652021126663]
We present CREATIVITY INDEX as the first step to quantify the linguistic creativity of a text. To compute CREATIVITY INDEX efficiently, we introduce DJ SEARCH, a novel dynamic programming algorithm. Experiments reveal that the CREATIVITY INDEX of professional human authors is on average 66.2% higher than that of LLMs.
arXiv Detail & Related papers (2024-10-05T18:55:01Z)
Agents' Room: Narrative Generation through Multi-step Collaboration [54.98886593802834]
We propose a generation framework inspired by narrative theory that decomposes narrative writing into subtasks tackled by specialized agents. We show that Agents' Room generates stories preferred by expert evaluators over those produced by baseline systems.
arXiv Detail & Related papers (2024-10-03T15:44:42Z)
A Character-Centric Creative Story Generation via Imagination [15.345466372805516]
We introduce a novel story generation framework called CCI (Character-centric Creative story generation via Imagination) CCI features two modules for creative story generation: IG (Image-Guided Imagination) and MW (Multi-Writer model) In the IG module, we utilize a text-to-image model to create visual representations of key story elements, such as characters, backgrounds, and main plots. The MW module uses these story elements to generate multiple persona-description candidates and selects the best one to insert into the story, thereby enhancing the richness and depth of the narrative.
arXiv Detail & Related papers (2024-09-25T06:54:29Z)
Small Language Models can Outperform Humans in Short Creative Writing: A Study Comparing SLMs with Humans and LLMs [0.9831489366502301]
We evaluate the creative fiction writing abilities of a fine-tuned small language model (SLM), BART Large, and compare its performance to humans and two large language models (LLMs): GPT-3.5 and GPT-4o.
arXiv Detail & Related papers (2024-09-17T20:40:02Z)
Are Large Language Models Capable of Generating Human-Level Narratives? [114.34140090869175]
This paper investigates the capability of LLMs in storytelling, focusing on narrative development and plot progression. We introduce a novel computational framework to analyze narratives through three discourse-level aspects. We show that explicit integration of discourse features can enhance storytelling, as is demonstrated by over 40% improvement in neural storytelling.
arXiv Detail & Related papers (2024-07-18T08:02:49Z)
The Unlikely Duel: Evaluating Creative Writing in LLMs through a Unique Scenario [12.852843553759744]
We evaluate recent state-of-the-art, instruction-tuned large language models (LLMs) on an English creative writing task. We use a specifically-tailored prompt (based on an epic combat between Ignatius J. Reilly and a pterodactyl) to minimize the risk of training data leakage. evaluation is performed by humans using a detailed rubric including various aspects like fluency, style, originality or humor.
arXiv Detail & Related papers (2024-06-22T17:01:59Z)
Measuring Psychological Depth in Language Models [50.48914935872879]
We introduce the Psychological Depth Scale (PDS), a novel framework rooted in literary theory that measures an LLM's ability to produce authentic and narratively complex stories. We empirically validate our framework by showing that humans can consistently evaluate stories based on PDS (0.72 Krippendorff's alpha) Surprisingly, GPT-4 stories either surpassed or were statistically indistinguishable from highly-rated human-written stories sourced from Reddit.
arXiv Detail & Related papers (2024-06-18T14:51:54Z)
Divergent Creativity in Humans and Large Language Models [37.67363469600804]
The recent surge in the capabilities of Large Language Models has led to claims that they are approaching a level of creativity akin to human capabilities. We leverage recent advances in creativity science to build a framework for in-depth analysis of divergent creativity in both state-of-the-art LLMs and a substantial dataset of 100,000 humans.
arXiv Detail & Related papers (2024-05-13T22:37:52Z)
Art or Artifice? Large Language Models and the False Promise of Creativity [53.04834589006685]
We propose the Torrance Test of Creative Writing (TTCW) to evaluate creativity as a product. TTCW consists of 14 binary tests organized into the original dimensions of Fluency, Flexibility, Originality, and Elaboration. Our analysis shows that LLM-generated stories pass 3-10X less TTCW tests than stories written by professionals.
arXiv Detail & Related papers (2023-09-25T22:02:46Z)
On the Creativity of Large Language Models [2.4555276449137042]
Large Language Models (LLMs) are revolutionizing several areas of Artificial Intelligence. This article first analyzes the development of LLMs under the lens of creativity theories. Then, we consider different classic perspectives, namely product, process, press, and person. Finally, we examine the societal impact of these technologies with a particular focus on the creative industries.
arXiv Detail & Related papers (2023-03-27T18:00:01Z)
The Next Chapter: A Study of Large Language Models in Storytelling [51.338324023617034]
The application of prompt-based learning with large language models (LLMs) has exhibited remarkable performance in diverse natural language processing (NLP) tasks. This paper conducts a comprehensive investigation, utilizing both automatic and human evaluation, to compare the story generation capacity of LLMs with recent models. The results demonstrate that LLMs generate stories of significantly higher quality compared to other story generation models.
arXiv Detail & Related papers (2023-01-24T02:44:02Z)
Computational Storytelling and Emotions: A Survey [56.95572957863576]
This survey paper is intended to summarize and contribute to the development of research being conducted on the relationship between stories and emotions. We believe creativity research is not to replace humans with computers, but to find a way of collaboration between humans and computers to enhance the creativity.
arXiv Detail & Related papers (2022-05-23T00:21:59Z)
Collaborative Storytelling with Large-scale Neural Language Models [6.0794985566317425]
We introduce the task of collaborative storytelling, where an artificial intelligence agent and a person collaborate to create a unique story by taking turns adding to it. We present a collaborative storytelling system which works with a human storyteller to create a story by generating new utterances based on the story so far.
arXiv Detail & Related papers (2020-11-20T04:36:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.