Related papers: Small Language Models can Outperform Humans in Short Creative Writing: A Study Comparing SLMs with Humans and LLMs

Small Language Models can Outperform Humans in Short Creative Writing: A Study Comparing SLMs with Humans and LLMs

URL: http://arxiv.org/abs/2409.11547v1
Date: Tue, 17 Sep 2024 20:40:02 GMT
Title: Small Language Models can Outperform Humans in Short Creative Writing: A Study Comparing SLMs with Humans and LLMs
Authors: Guillermo Marco, Luz Rello, Julio Gonzalo,
Abstract summary: We evaluate the creative fiction writing abilities of a fine-tuned small language model (SLM), BART Large, and compare its performance to humans and two large language models (LLMs): GPT-3.5 and GPT-4o.
Score: 0.9831489366502301
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we evaluate the creative fiction writing abilities of a fine-tuned small language model (SLM), BART Large, and compare its performance to humans and two large language models (LLMs): GPT-3.5 and GPT-4o. Our evaluation consists of two experiments: (i) a human evaluation where readers assess the stories generated by the SLM compared to human-written stories, and (ii) a qualitative linguistic analysis comparing the textual characteristics of the stories generated by the different models. In the first experiment, we asked 68 participants to rate short stories generated by the models and humans along dimensions such as grammaticality, relevance, creativity, and attractiveness. BART Large outperformed human writers in most aspects, except creativity, with an overall score of 2.11 compared to 1.85 for human-written texts -- a 14% improvement. In the second experiment, the qualitative analysis revealed that, while GPT-4o exhibited near-perfect internal and external coherence, it tended to produce more predictable narratives, with only 3% of its stories seen as novel. In contrast, 15% of BART's stories were considered novel, indicating a higher degree of creativity despite its smaller model size. This study provides both quantitative and qualitative insights into how model size and fine-tuning influence the balance between creativity, fluency, and coherence in creative writing tasks.

Related papers

Evaluating Creative Short Story Generation in Humans and Large Language Models [0.7965327033045846]
Large language models (LLMs) have recently demonstrated the ability to generate high-quality stories. We conduct a systematic analysis of creativity in short story generation across LLMs and everyday people. Our findings reveal that while LLMs can generate stylistically complex stories, they tend to fall short in terms of creativity when compared to average human writers.
arXiv Detail & Related papers (2024-11-04T17:40:39Z)
Causal Micro-Narratives [62.47217054314046]
We present a novel approach to classify causal micro-narratives from text. These narratives are sentence-level explanations of the cause(s) and/or effect(s) of a target subject.
arXiv Detail & Related papers (2024-10-07T17:55:10Z)
AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text [53.15652021126663]
We present CREATIVITY INDEX as the first step to quantify the linguistic creativity of a text. To compute CREATIVITY INDEX efficiently, we introduce DJ SEARCH, a novel dynamic programming algorithm. Experiments reveal that the CREATIVITY INDEX of professional human authors is on average 66.2% higher than that of LLMs.
arXiv Detail & Related papers (2024-10-05T18:55:01Z)
A Character-Centric Creative Story Generation via Imagination [15.345466372805516]
We introduce a novel story generation framework called CCI (Character-centric Creative story generation via Imagination) CCI features two modules for creative story generation: IG (Image-Guided Imagination) and MW (Multi-Writer model) In the IG module, we utilize a text-to-image model to create visual representations of key story elements, such as characters, backgrounds, and main plots. The MW module uses these story elements to generate multiple persona-description candidates and selects the best one to insert into the story, thereby enhancing the richness and depth of the narrative.
arXiv Detail & Related papers (2024-09-25T06:54:29Z)
MirrorStories: Reflecting Diversity through Personalized Narrative Generation with Large Language Models [5.397565689903148]
This study explores the effectiveness of Large Language Models (LLMs) in creating personalized "mirror stories" MirrorStories is a corpus of 1,500 personalized short stories generated by integrating elements such as name, gender, age, ethnicity, reader interest, and story moral.
arXiv Detail & Related papers (2024-09-20T22:43:13Z)
Are Large Language Models Capable of Generating Human-Level Narratives? [114.34140090869175]
This paper investigates the capability of LLMs in storytelling, focusing on narrative development and plot progression. We introduce a novel computational framework to analyze narratives through three discourse-level aspects. We show that explicit integration of discourse features can enhance storytelling, as is demonstrated by over 40% improvement in neural storytelling.
arXiv Detail & Related papers (2024-07-18T08:02:49Z)
Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts [49.97673761305336]
We evaluate three large language models (LLMs) for their alignment with human narrative styles and potential gender biases. Our findings indicate that, while these models generally produce text closely resembling human authored content, variations in stylistic features suggest significant gender biases.
arXiv Detail & Related papers (2024-06-27T19:26:11Z)
Measuring Psychological Depth in Language Models [50.48914935872879]
We introduce the Psychological Depth Scale (PDS), a novel framework rooted in literary theory that measures an LLM's ability to produce authentic and narratively complex stories. We empirically validate our framework by showing that humans can consistently evaluate stories based on PDS (0.72 Krippendorff's alpha) Surprisingly, GPT-4 stories either surpassed or were statistically indistinguishable from highly-rated human-written stories sourced from Reddit.
arXiv Detail & Related papers (2024-06-18T14:51:54Z)
Probing the Creativity of Large Language Models: Can models produce divergent semantic association? [9.992602859777689]
The present study aims to investigate the creative thinking of large language models through a cognitive perspective. We utilize the divergent association task ( DAT), an objective measurement of creativity that asks models to generate unrelated words and calculates the semantic distance between them. Our results imply that advanced large language models have divergent semantic associations, which is a fundamental process underlying creativity.
arXiv Detail & Related papers (2023-10-17T11:23:32Z)
Art or Artifice? Large Language Models and the False Promise of Creativity [53.04834589006685]
We propose the Torrance Test of Creative Writing (TTCW) to evaluate creativity as a product. TTCW consists of 14 binary tests organized into the original dimensions of Fluency, Flexibility, Originality, and Elaboration. Our analysis shows that LLM-generated stories pass 3-10X less TTCW tests than stories written by professionals.
arXiv Detail & Related papers (2023-09-25T22:02:46Z)
Do Large Language Models Show Decision Heuristics Similar to Humans? A Case Study Using GPT-3.5 [0.0]
GPT-3.5 is an example of an LLM that supports a conversational agent called ChatGPT. In this work, we used a series of novel prompts to determine whether ChatGPT shows biases, and other decision effects. We also tested the same prompts on human participants.
arXiv Detail & Related papers (2023-05-08T01:02:52Z)
AI, write an essay for me: A large-scale comparison of human-written versus ChatGPT-generated essays [66.36541161082856]
ChatGPT and similar generative AI models have attracted hundreds of millions of users. This study compares human-written versus ChatGPT-generated argumentative student essays.
arXiv Detail & Related papers (2023-04-24T12:58:28Z)
Training Language Models with Language Feedback at Scale [50.70091340506957]
We introduce learning from Language Feedback (ILF), a new approach that utilizes more informative language feedback. ILF consists of three steps that are applied iteratively: first, conditioning the language model on the input, an initial LM output, and feedback to generate refinements. We show theoretically that ILF can be viewed as Bayesian Inference, similar to Reinforcement Learning from human feedback.
arXiv Detail & Related papers (2023-03-28T17:04:15Z)
The Next Chapter: A Study of Large Language Models in Storytelling [51.338324023617034]
The application of prompt-based learning with large language models (LLMs) has exhibited remarkable performance in diverse natural language processing (NLP) tasks. This paper conducts a comprehensive investigation, utilizing both automatic and human evaluation, to compare the story generation capacity of LLMs with recent models. The results demonstrate that LLMs generate stories of significantly higher quality compared to other story generation models.
arXiv Detail & Related papers (2023-01-24T02:44:02Z)
Computational Lens on Cognition: Study Of Autobiographical Versus Imagined Stories With Large-Scale Language Models [95.88620740809004]
We study differences in the narrative flow of events in autobiographical versus imagined stories using GPT-3. We found that imagined stories have higher sequentiality than autobiographical stories. In comparison to imagined stories, autobiographical stories contain more concrete words and words related to the first person.
arXiv Detail & Related papers (2022-01-07T20:10:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.