Related papers: Evaluating Creative Short Story Generation in Humans and Large Language Models

Evaluating Creative Short Story Generation in Humans and Large Language Models

URL: http://arxiv.org/abs/2411.02316v2
Date: Wed, 06 Nov 2024 23:27:24 GMT
Title: Evaluating Creative Short Story Generation in Humans and Large Language Models
Authors: Mete Ismayilzada, Claire Stevenson, Lonneke van der Plas,
Abstract summary: Large language models (LLMs) have recently demonstrated the ability to generate high-quality stories. We conduct a systematic analysis of creativity in short story generation across LLMs and everyday people. Our findings reveal that while LLMs can generate stylistically complex stories, they tend to fall short in terms of creativity when compared to average human writers.
Score: 0.7965327033045846
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Storytelling is a fundamental aspect of human communication, relying heavily on creativity to produce narratives that are novel, appropriate, and surprising. While large language models (LLMs) have recently demonstrated the ability to generate high-quality stories, their creative capabilities remain underexplored. Previous research has either focused on creativity tests requiring short responses or primarily compared model performance in story generation to that of professional writers. However, the question of whether LLMs exhibit creativity in writing short stories on par with the average human remains unanswered. In this work, we conduct a systematic analysis of creativity in short story generation across LLMs and everyday people. Using a five-sentence creative story task, commonly employed in psychology to assess human creativity, we automatically evaluate model- and human-generated stories across several dimensions of creativity, including novelty, surprise, and diversity. Our findings reveal that while LLMs can generate stylistically complex stories, they tend to fall short in terms of creativity when compared to average human writers.

Related papers

CreativityPrism: A Holistic Benchmark for Large Language Model Creativity [64.18257552903151]
Creativity is often seen as a hallmark of human intelligence.<n>There is still no holistic framework to evaluate their creativity across diverse scenarios.<n>We propose CreativityPrism, an evaluation analysis framework that decomposes creativity into three dimensions: quality, novelty, and diversity.
arXiv Detail & Related papers (2025-10-23T00:22:10Z)
Echoes in AI: Quantifying Lack of Plot Diversity in LLM Outputs [26.682827310724363]
We examine two state-of-the-art large language models (LLMs) on story generation. We find that LLM-generated stories often consist of plot elements that are echoed across a number of generations. We introduce the Sui Generis score, which estimates how unlikely a plot element is to appear in alternative storylines.
arXiv Detail & Related papers (2024-12-31T04:54:48Z)
AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text [53.15652021126663]
We present CREATIVITY INDEX as the first step to quantify the linguistic creativity of a text. To compute CREATIVITY INDEX efficiently, we introduce DJ SEARCH, a novel dynamic programming algorithm. Experiments reveal that the CREATIVITY INDEX of professional human authors is on average 66.2% higher than that of LLMs.
arXiv Detail & Related papers (2024-10-05T18:55:01Z)
Agents' Room: Narrative Generation through Multi-step Collaboration [54.98886593802834]
We propose a generation framework inspired by narrative theory that decomposes narrative writing into subtasks tackled by specialized agents. We show that Agents' Room generates stories preferred by expert evaluators over those produced by baseline systems.
arXiv Detail & Related papers (2024-10-03T15:44:42Z)
A Character-Centric Creative Story Generation via Imagination [15.345466372805516]
We introduce a novel story generation framework called CCI (Character-centric Creative story generation via Imagination) CCI features two modules for creative story generation: IG (Image-Guided Imagination) and MW (Multi-Writer model) In the IG module, we utilize a text-to-image model to create visual representations of key story elements, such as characters, backgrounds, and main plots. The MW module uses these story elements to generate multiple persona-description candidates and selects the best one to insert into the story, thereby enhancing the richness and depth of the narrative.
arXiv Detail & Related papers (2024-09-25T06:54:29Z)
Small Language Models can Outperform Humans in Short Creative Writing: A Study Comparing SLMs with Humans and LLMs [0.9831489366502301]
We evaluate the creative fiction writing abilities of a fine-tuned small language model (SLM), BART Large, and compare its performance to humans and two large language models (LLMs): GPT-3.5 and GPT-4o.
arXiv Detail & Related papers (2024-09-17T20:40:02Z)
Are Large Language Models Capable of Generating Human-Level Narratives? [114.34140090869175]
This paper investigates the capability of LLMs in storytelling, focusing on narrative development and plot progression. We introduce a novel computational framework to analyze narratives through three discourse-level aspects. We show that explicit integration of discourse features can enhance storytelling, as is demonstrated by over 40% improvement in neural storytelling.
arXiv Detail & Related papers (2024-07-18T08:02:49Z)
The Unlikely Duel: Evaluating Creative Writing in LLMs through a Unique Scenario [12.852843553759744]
We evaluate recent state-of-the-art, instruction-tuned large language models (LLMs) on an English creative writing task. We use a specifically-tailored prompt (based on an epic combat between Ignatius J. Reilly and a pterodactyl) to minimize the risk of training data leakage. evaluation is performed by humans using a detailed rubric including various aspects like fluency, style, originality or humor.
arXiv Detail & Related papers (2024-06-22T17:01:59Z)
Measuring Psychological Depth in Language Models [50.48914935872879]
We introduce the Psychological Depth Scale (PDS), a novel framework rooted in literary theory that measures an LLM's ability to produce authentic and narratively complex stories. We empirically validate our framework by showing that humans can consistently evaluate stories based on PDS (0.72 Krippendorff's alpha) Surprisingly, GPT-4 stories either surpassed or were statistically indistinguishable from highly-rated human-written stories sourced from Reddit.
arXiv Detail & Related papers (2024-06-18T14:51:54Z)
Divergent Creativity in Humans and Large Language Models [37.67363469600804]
The recent surge in the capabilities of Large Language Models has led to claims that they are approaching a level of creativity akin to human capabilities. We leverage recent advances in creativity science to build a framework for in-depth analysis of divergent creativity in both state-of-the-art LLMs and a substantial dataset of 100,000 humans.
arXiv Detail & Related papers (2024-05-13T22:37:52Z)
Characterising the Creative Process in Humans and Large Language Models [6.363158395541767]
We provide an automated method to characterise how humans and LLMs explore semantic spaces on the Alternate Uses Task. We use sentence embeddings to identify response categories and compute semantic similarities, which we use to generate jump profiles. Our results corroborate earlier work in humans reporting both persistent (deep search in few semantic spaces) and flexible (broad search across multiple semantic spaces) pathways to creativity. Though LLMs as a population match human profiles, their relationship with creativity is different, where the more flexible models score higher on creativity.
arXiv Detail & Related papers (2024-05-01T23:06:46Z)
Art or Artifice? Large Language Models and the False Promise of Creativity [53.04834589006685]
We propose the Torrance Test of Creative Writing (TTCW) to evaluate creativity as a product. TTCW consists of 14 binary tests organized into the original dimensions of Fluency, Flexibility, Originality, and Elaboration. Our analysis shows that LLM-generated stories pass 3-10X less TTCW tests than stories written by professionals.
arXiv Detail & Related papers (2023-09-25T22:02:46Z)
On the Creativity of Large Language Models [2.4555276449137042]
Large Language Models (LLMs) are revolutionizing several areas of Artificial Intelligence. This article first analyzes the development of LLMs under the lens of creativity theories. Then, we consider different classic perspectives, namely product, process, press, and person. Finally, we examine the societal impact of these technologies with a particular focus on the creative industries.
arXiv Detail & Related papers (2023-03-27T18:00:01Z)
The Next Chapter: A Study of Large Language Models in Storytelling [51.338324023617034]
The application of prompt-based learning with large language models (LLMs) has exhibited remarkable performance in diverse natural language processing (NLP) tasks. This paper conducts a comprehensive investigation, utilizing both automatic and human evaluation, to compare the story generation capacity of LLMs with recent models. The results demonstrate that LLMs generate stories of significantly higher quality compared to other story generation models.
arXiv Detail & Related papers (2023-01-24T02:44:02Z)
Computational Storytelling and Emotions: A Survey [56.95572957863576]
This survey paper is intended to summarize and contribute to the development of research being conducted on the relationship between stories and emotions. We believe creativity research is not to replace humans with computers, but to find a way of collaboration between humans and computers to enhance the creativity.
arXiv Detail & Related papers (2022-05-23T00:21:59Z)
Collaborative Storytelling with Large-scale Neural Language Models [6.0794985566317425]
We introduce the task of collaborative storytelling, where an artificial intelligence agent and a person collaborate to create a unique story by taking turns adding to it. We present a collaborative storytelling system which works with a human storyteller to create a story by generating new utterances based on the story so far.
arXiv Detail & Related papers (2020-11-20T04:36:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.