Evaluating the Creativity of LLMs in Persian Literary Text Generation
- URL: http://arxiv.org/abs/2509.18401v1
- Date: Mon, 22 Sep 2025 20:32:56 GMT
- Title: Evaluating the Creativity of LLMs in Persian Literary Text Generation
- Authors: Armin Tourajmehr, Mohammad Reza Modarres, Yadollah Yaghoobzadeh,
- Abstract summary: We build a dataset of user-generated Persian literary spanning 20 diverse topics.<n>We assess model outputs along four creativity dimensions-originality, fluency, flexibility, and elaboration-by adapting the Torrance Tests of Creative Thinking.
- Score: 5.067768639196139
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have demonstrated notable creative abilities in generating literary texts, including poetry and short stories. However, prior research has primarily centered on English, with limited exploration of non-English literary traditions and without standardized methods for assessing creativity. In this paper, we evaluate the capacity of LLMs to generate Persian literary text enriched with culturally relevant expressions. We build a dataset of user-generated Persian literary spanning 20 diverse topics and assess model outputs along four creativity dimensions-originality, fluency, flexibility, and elaboration-by adapting the Torrance Tests of Creative Thinking. To reduce evaluation costs, we adopt an LLM as a judge for automated scoring and validate its reliability against human judgments using intraclass correlation coefficients, observing strong agreement. In addition, we analyze the models' ability to understand and employ four core literary devices: simile, metaphor, hyperbole, and antithesis. Our results highlight both the strengths and limitations of LLMs in Persian literary text generation, underscoring the need for further refinement.
Related papers
- Capabilities and Evaluation Biases of Large Language Models in Classical Chinese Poetry Generation: A Case Study on Tang Poetry [4.720025219010595]
Large Language Models (LLMs) are increasingly applied to creative domains, yet their performance in classical Chinese poetry generation and evaluation remains poorly understood.<n>We propose a three-step evaluation framework that combines computational metrics, LLM-as-a-judge assessment, and human expert validation.
arXiv Detail & Related papers (2025-10-17T05:00:37Z) - Liaozhai through the Looking-Glass: On Paratextual Explicitation of Culture-Bound Terms in Machine Translation [70.43884512651668]
We formalize Genette's (1987) theory of paratexts from literary and translation studies to introduce the task of paratextual explicitation for machine translation.<n>We construct a dataset of 560 expert-aligned paratexts from four English translations of the classical Chinese short story collection Liaozhai.<n>Our findings demonstrate the potential of paratextual explicitation in advancing machine translation beyond linguistic equivalence.
arXiv Detail & Related papers (2025-09-27T16:27:36Z) - Structuralist Approach to AI Literary Criticism: Leveraging Greimas Semiotic Square for Large Language Models [2.7323591332394166]
GLASS (Greimas Literary Analysis via Semiotic Square) is a structured analytical framework based on Greimas Semiotic Square (GSS)<n> GLASS facilitates the rapid dissection of narrative structures and deep meanings in narrative works.<n>This research provides an AI-based tool for literary research and education, offering insights into the cognitive mechanisms underlying literary engagement.
arXiv Detail & Related papers (2025-06-26T15:10:24Z) - A Causality-aware Paradigm for Evaluating Creativity of Multimodal Large Language Models [100.16387798660833]
Oogiri game is a creativity-driven task requiring humor and associative thinking.<n>LoTbench is an interactive, causality-aware evaluation framework.<n>Results show that while most LLMs exhibit constrained creativity, the performance gap between LLMs and humans is not insurmountable.
arXiv Detail & Related papers (2025-01-25T09:11:15Z) - Large Language Models for Automated Literature Review: An Evaluation of Reference Generation, Abstract Writing, and Review Composition [2.048226951354646]
Large language models (LLMs) have emerged as a potential solution to automate the complex processes involved in writing literature reviews.<n>This study introduces a framework to automatically evaluate the performance of LLMs in three key tasks of literature writing.
arXiv Detail & Related papers (2024-12-18T08:42:25Z) - A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution [57.309390098903]
Authorship attribution aims to identify the origin or author of a document.
Large Language Models (LLMs) with their deep reasoning capabilities and ability to maintain long-range textual associations offer a promising alternative.
Our results on the IMDb and blog datasets show an impressive 85% accuracy in one-shot authorship classification across ten authors.
arXiv Detail & Related papers (2024-10-29T04:14:23Z) - A Perspective on Literary Metaphor in the Context of Generative AI [0.6445605125467572]
This study explores the role of literary metaphor and its capacity to generate a range of meanings.
To investigate whether the inclusion of original figurative language improves textual quality, we trained an LSTM-based language model in Afrikaans.
The paper raises thought-provoking questions on aesthetic value, interpretation and evaluation.
arXiv Detail & Related papers (2024-09-02T08:27:29Z) - Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts [49.97673761305336]
We evaluate three large language models (LLMs) for their alignment with human narrative styles and potential gender biases.
Our findings indicate that, while these models generally produce text closely resembling human authored content, variations in stylistic features suggest significant gender biases.
arXiv Detail & Related papers (2024-06-27T19:26:11Z) - LFED: A Literary Fiction Evaluation Dataset for Large Language Models [58.85989777743013]
We collect 95 literary fictions that are either originally written in Chinese or translated into Chinese, covering a wide range of topics across several centuries.
We define a question taxonomy with 8 question categories to guide the creation of 1,304 questions.
We conduct an in-depth analysis to ascertain how specific attributes of literary fictions (e.g., novel types, character numbers, the year of publication) impact LLM performance in evaluations.
arXiv Detail & Related papers (2024-05-16T15:02:24Z) - Evaluating Large Language Model Creativity from a Literary Perspective [13.672268920902187]
This paper assesses the potential for large language models to serve as assistive tools in the creative writing process.
We develop interactive and multi-voice prompting strategies that interleave background descriptions, instructions that guide composition, samples of text in the target style, and critical discussion of the given samples.
arXiv Detail & Related papers (2023-11-30T16:46:25Z) - Art or Artifice? Large Language Models and the False Promise of
Creativity [53.04834589006685]
We propose the Torrance Test of Creative Writing (TTCW) to evaluate creativity as a product.
TTCW consists of 14 binary tests organized into the original dimensions of Fluency, Flexibility, Originality, and Elaboration.
Our analysis shows that LLM-generated stories pass 3-10X less TTCW tests than stories written by professionals.
arXiv Detail & Related papers (2023-09-25T22:02:46Z) - Neural Authorship Attribution: Stylometric Analysis on Large Language
Models [16.63955074133222]
Large language models (LLMs) such as GPT-4, PaLM, and Llama have significantly propelled the generation of AI-crafted text.
With rising concerns about their potential misuse, there is a pressing need for AI-generated-text forensics.
arXiv Detail & Related papers (2023-08-14T17:46:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.