Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?
- URL: http://arxiv.org/abs/2407.01119v2
- Date: Mon, 28 Oct 2024 16:32:01 GMT
- Title: Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing?
- Authors: Guillermo Marco, Julio Gonzalo, Ramón del Castillo, María Teresa Mateo Girona,
- Abstract summary: Large Language Models (LLMs) outperform average humans in a wide range of language-related tasks.
We have carried out a contest between Patricio Pron and GPT-4, in the spirit of AI-human duels such as DeepBlue vs Kasparov and AlphaGo vs Lee Sidol.
The results indicate that LLMs are still far from challenging a top human creative writer.
- Score: 0.8999666725996975
- License:
- Abstract: It has become routine to report research results where Large Language Models (LLMs) outperform average humans in a wide range of language-related tasks, and creative text writing is no exception. It seems natural, then, to raise the bid: Are LLMs ready to compete in creative writing skills with a top (rather than average) novelist? To provide an initial answer for this question, we have carried out a contest between Patricio Pron (an awarded novelist, considered one of the best of his generation) and GPT-4 (one of the top performing LLMs), in the spirit of AI-human duels such as DeepBlue vs Kasparov and AlphaGo vs Lee Sidol. We asked Pron and GPT-4 to provide thirty titles each, and then to write short stories for both their titles and their opponent's. Then, we prepared an evaluation rubric inspired by Boden's definition of creativity, and we collected 5,400 manual assessments provided by literature critics and scholars. The results of our experimentation indicate that LLMs are still far from challenging a top human creative writer, and that reaching such level of autonomous creative writing skills probably cannot be reached simply with larger language models.
Related papers
- Evaluating Creative Short Story Generation in Humans and Large Language Models [0.7965327033045846]
Large language models (LLMs) have recently demonstrated the ability to generate high-quality stories.
We conduct a systematic analysis of creativity in short story generation across LLMs and everyday people.
Our findings reveal that while LLMs can generate stylistically complex stories, they tend to fall short in terms of creativity when compared to average human writers.
arXiv Detail & Related papers (2024-11-04T17:40:39Z) - Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models [52.00446751692225]
We present a novel and simple yet effective method called textbfDictionary textbfInsertion textbfPrompting (textbfDIP)
When providing a non-English prompt, DIP looks up a word dictionary and inserts words' English counterparts into the prompt for LLMs.
It then enables better translation into English and better English model thinking steps which leads to obviously better results.
arXiv Detail & Related papers (2024-11-02T05:10:50Z) - AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text [53.15652021126663]
We present CREATIVITY INDEX as the first step to quantify the linguistic creativity of a text.
To compute CREATIVITY INDEX efficiently, we introduce DJ SEARCH, a novel dynamic programming algorithm.
Experiments reveal that the CREATIVITY INDEX of professional human authors is on average 66.2% higher than that of LLMs.
arXiv Detail & Related papers (2024-10-05T18:55:01Z) - The Unlikely Duel: Evaluating Creative Writing in LLMs through a Unique Scenario [12.852843553759744]
We evaluate recent state-of-the-art, instruction-tuned large language models (LLMs) on an English creative writing task.
We use a specifically-tailored prompt (based on an epic combat between Ignatius J. Reilly and a pterodactyl) to minimize the risk of training data leakage.
evaluation is performed by humans using a detailed rubric including various aspects like fluency, style, originality or humor.
arXiv Detail & Related papers (2024-06-22T17:01:59Z) - HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing [45.95600225239927]
Large language models (LLMs) can hardly produce written works at the level of human experts due to the extremely high complexity of literature writing.
We present HoLLMwood, an automated framework for unleashing the creativity of LLMs and exploring their potential in screenwriting.
arXiv Detail & Related papers (2024-06-17T16:01:33Z) - LFED: A Literary Fiction Evaluation Dataset for Large Language Models [58.85989777743013]
We collect 95 literary fictions that are either originally written in Chinese or translated into Chinese, covering a wide range of topics across several centuries.
We define a question taxonomy with 8 question categories to guide the creation of 1,304 questions.
We conduct an in-depth analysis to ascertain how specific attributes of literary fictions (e.g., novel types, character numbers, the year of publication) impact LLM performance in evaluations.
arXiv Detail & Related papers (2024-05-16T15:02:24Z) - Can AI Write Classical Chinese Poetry like Humans? An Empirical Study
Inspired by Turing Test [8.539465812580612]
We propose ProFTAP, a novel evaluation framework inspired by Turing test to assess AI's poetry writing capability.
We find that recent large language models (LLMs) do indeed possess the ability to write classical Chinese poems nearly indistinguishable from those of humans.
arXiv Detail & Related papers (2024-01-10T06:21:47Z) - Instruction-Following Evaluation for Large Language Models [52.90926820437014]
We introduce Instruction-Following Eval (IFEval) for large language models.
IFEval is a straightforward and easy-to-reproduce evaluation benchmark.
We show evaluation results of two widely available LLMs on the market.
arXiv Detail & Related papers (2023-11-14T05:13:55Z) - A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative
Writing [0.0]
We evaluate recent LLMs on English creative writing, a challenging and complex task that requires imagination, coherence, and style.
We ask several LLMs and humans to write such a story and conduct a human evalution involving various criteria such as originality, humor, and style.
Our results show that some state-of-the-art commercial LLMs match or slightly outperform our writers in most dimensions; whereas open-source LLMs lag behind.
arXiv Detail & Related papers (2023-10-12T15:56:24Z) - Art or Artifice? Large Language Models and the False Promise of
Creativity [53.04834589006685]
We propose the Torrance Test of Creative Writing (TTCW) to evaluate creativity as a product.
TTCW consists of 14 binary tests organized into the original dimensions of Fluency, Flexibility, Originality, and Elaboration.
Our analysis shows that LLM-generated stories pass 3-10X less TTCW tests than stories written by professionals.
arXiv Detail & Related papers (2023-09-25T22:02:46Z) - Creative Writing with an AI-Powered Writing Assistant: Perspectives from
Professional Writers [9.120878749348986]
Natural language generation (NLG) using neural language models has brought us closer than ever to the goal of building AI-powered creative writing tools.
Recent developments in natural language generation using neural language models have brought us closer than ever to the goal of building AI-powered creative writing tools.
arXiv Detail & Related papers (2022-11-09T17:00:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.