Related papers: Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?

Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?

URL: http://arxiv.org/abs/2407.01119v1
Date: Mon, 1 Jul 2024 09:28:58 GMT
Title: Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?
Authors: Guillermo Marco, Julio Gonzalo, Ramón del Castillo, María Teresa Mateo Girona,
Abstract summary: Large Language Models (LLMs) outperform average humans in a wide range of language-related tasks. We have carried out a contest between Patricio Pron and GPT-4, in the spirit of AI-human duels such as DeepBlue vs Kasparov and AlphaGo vs Lee Sidol. The results indicate that LLMs are still far from challenging a top human creative writer.
Score: 0.8999666725996975
License: http://creativecommons.org/licenses/by/4.0/
Abstract: It has become routine to report research results where Large Language Models (LLMs) outperform average humans in a wide range of language-related tasks, and creative text writing is no exception. It seems natural, then, to raise the bid: Are LLMs ready to compete in creative writing skills with a top (rather than average) novelist? To provide an initial answer for this question, we have carried out a contest between Patricio Pron (an awarded novelist, considered one of the best of his generation) and GPT-4 (one of the top performing LLMs), in the spirit of AI-human duels such as DeepBlue vs Kasparov and AlphaGo vs Lee Sidol. We asked Pron and GPT-4 to provide thirty titles each, and then to write short stories for both their titles and their opponent's. Then, we prepared an evaluation rubric inspired by Boden's definition of creativity, and we collected 5,400 manual assessments provided by literature critics and scholars. The results of our experimentation indicate that LLMs are still far from challenging a top human creative writer, and that reaching such level of autonomous creative writing skills probably cannot be reached simply with larger language models.

Related papers

Help Me Write a Story: Evaluating LLMs' Ability to Generate Writing Feedback [57.200668979963694]
We present a novel test set of 1,300 stories that we corrupted to intentionally introduce writing issues.<n>We study the performance of commonly used LLMs in this task with both automatic and human evaluation metrics.
arXiv Detail & Related papers (2025-07-21T18:56:50Z)
Cooking Up Creativity: A Cognitively-Inspired Approach for Enhancing LLM Creativity through Structured Representations [53.950760059792614]
Large Language Models (LLMs) excel at countless tasks, yet struggle with creativity. We introduce a novel approach that couples LLMs with structured representations and cognitively inspired manipulations to generate more creative and diverse ideas. We demonstrate our approach in the culinary domain with DishCOVER, a model that generates creative recipes.
arXiv Detail & Related papers (2025-04-29T11:13:06Z)
Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages [51.96666324242191]
We analyze whether user utilization of novel writing assistants in a charity advertisement writing task is affected by the AI's performance in a second language. We quantify the extent to which these patterns translate into the persuasiveness of generated charity advertisements.
arXiv Detail & Related papers (2025-02-13T17:49:30Z)
Large Language Models show both individual and collective creativity comparable to humans [39.90254321453145]
Large Language Models (LLMs) show creativity comparable to humans. We benchmark the LLMs against individual humans, and also take a novel approach by comparing them to the collective creativity of groups of humans. When questioned 10 times, an LLM's collective creativity is equivalent to 8-10 humans.
arXiv Detail & Related papers (2024-12-04T09:18:54Z)
"It was 80% me, 20% AI": Seeking Authenticity in Co-Writing with Large Language Models [97.22914355737676]
We examine whether and how writers want to preserve their authentic voice when co-writing with AI tools. Our findings illuminate conceptions of authenticity in human-AI co-creation. Readers' responses showed less concern about human-AI co-writing.
arXiv Detail & Related papers (2024-11-20T04:42:32Z)
Evaluating Creative Short Story Generation in Humans and Large Language Models [0.7965327033045846]
Large language models (LLMs) have recently demonstrated the ability to generate high-quality stories. We conduct a systematic analysis of creativity in short story generation across LLMs and everyday people. Our findings reveal that while LLMs can generate stylistically complex stories, they tend to fall short in terms of creativity when compared to average human writers.
arXiv Detail & Related papers (2024-11-04T17:40:39Z)
Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models [52.00446751692225]
We present a novel and simple yet effective method called textbfDictionary textbfInsertion textbfPrompting (textbfDIP) When providing a non-English prompt, DIP looks up a word dictionary and inserts words' English counterparts into the prompt for LLMs. It then enables better translation into English and better English model thinking steps which leads to obviously better results.
arXiv Detail & Related papers (2024-11-02T05:10:50Z)
AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text [53.15652021126663]
We present CREATIVITY INDEX as the first step to quantify the linguistic creativity of a text. To compute CREATIVITY INDEX efficiently, we introduce DJ SEARCH, a novel dynamic programming algorithm. Experiments reveal that the CREATIVITY INDEX of professional human authors is on average 66.2% higher than that of LLMs.
arXiv Detail & Related papers (2024-10-05T18:55:01Z)
The Unlikely Duel: Evaluating Creative Writing in LLMs through a Unique Scenario [12.852843553759744]
We evaluate recent state-of-the-art, instruction-tuned large language models (LLMs) on an English creative writing task. We use a specifically-tailored prompt (based on an epic combat between Ignatius J. Reilly and a pterodactyl) to minimize the risk of training data leakage. evaluation is performed by humans using a detailed rubric including various aspects like fluency, style, originality or humor.
arXiv Detail & Related papers (2024-06-22T17:01:59Z)
HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing [45.95600225239927]
Large language models (LLMs) can hardly produce written works at the level of human experts due to the extremely high complexity of literature writing. We present HoLLMwood, an automated framework for unleashing the creativity of LLMs and exploring their potential in screenwriting.
arXiv Detail & Related papers (2024-06-17T16:01:33Z)
LFED: A Literary Fiction Evaluation Dataset for Large Language Models [58.85989777743013]
We collect 95 literary fictions that are either originally written in Chinese or translated into Chinese, covering a wide range of topics across several centuries. We define a question taxonomy with 8 question categories to guide the creation of 1,304 questions. We conduct an in-depth analysis to ascertain how specific attributes of literary fictions (e.g., novel types, character numbers, the year of publication) impact LLM performance in evaluations.
arXiv Detail & Related papers (2024-05-16T15:02:24Z)
Can AI Write Classical Chinese Poetry like Humans? An Empirical Study Inspired by Turing Test [8.539465812580612]
We propose ProFTAP, a novel evaluation framework inspired by Turing test to assess AI's poetry writing capability. We find that recent large language models (LLMs) do indeed possess the ability to write classical Chinese poems nearly indistinguishable from those of humans.
arXiv Detail & Related papers (2024-01-10T06:21:47Z)
A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing [0.0]
We evaluate recent LLMs on English creative writing, a challenging and complex task that requires imagination, coherence, and style. We ask several LLMs and humans to write such a story and conduct a human evalution involving various criteria such as originality, humor, and style. Our results show that some state-of-the-art commercial LLMs match or slightly outperform our writers in most dimensions; whereas open-source LLMs lag behind.
arXiv Detail & Related papers (2023-10-12T15:56:24Z)
Art or Artifice? Large Language Models and the False Promise of Creativity [53.04834589006685]
We propose the Torrance Test of Creative Writing (TTCW) to evaluate creativity as a product. TTCW consists of 14 binary tests organized into the original dimensions of Fluency, Flexibility, Originality, and Elaboration. Our analysis shows that LLM-generated stories pass 3-10X less TTCW tests than stories written by professionals.
arXiv Detail & Related papers (2023-09-25T22:02:46Z)
Creative Writing with an AI-Powered Writing Assistant: Perspectives from Professional Writers [9.120878749348986]
Natural language generation (NLG) using neural language models has brought us closer than ever to the goal of building AI-powered creative writing tools. Recent developments in natural language generation using neural language models have brought us closer than ever to the goal of building AI-powered creative writing tools.
arXiv Detail & Related papers (2022-11-09T17:00:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.