Related papers: Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity

Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity

URL: http://arxiv.org/abs/2509.22641v1
Date: Fri, 26 Sep 2025 17:59:05 GMT
Title: Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity
Authors: Arkadiy Saakyan, Najoung Kim, Smaranda Muresan, Tuhin Chakrabarty,
Abstract summary: N-gram novelty is widely used to evaluate language models' ability to generate text outside of their training data.<n>We investigate the relationship between this notion of creativity and n-gram novelty through close reading of human and AI-generated text.<n>We find that while n-gram novelty is positively associated with expert writer-judged creativity, 91% of top-quartile expressions by n-gram novelty are not judged as creative.
Score: 29.58419742230708
License: http://creativecommons.org/licenses/by/4.0/
Abstract: N-gram novelty is widely used to evaluate language models' ability to generate text outside of their training data. More recently, it has also been adopted as a metric for measuring textual creativity. However, theoretical work on creativity suggests that this approach may be inadequate, as it does not account for creativity's dual nature: novelty (how original the text is) and appropriateness (how sensical and pragmatic it is). We investigate the relationship between this notion of creativity and n-gram novelty through 7542 expert writer annotations (n=26) of novelty, pragmaticality, and sensicality via close reading of human and AI-generated text. We find that while n-gram novelty is positively associated with expert writer-judged creativity, ~91% of top-quartile expressions by n-gram novelty are not judged as creative, cautioning against relying on n-gram novelty alone. Furthermore, unlike human-written text, higher n-gram novelty in open-source LLMs correlates with lower pragmaticality. In an exploratory study with frontier close-source models, we additionally confirm that they are less likely to produce creative expressions than humans. Using our dataset, we test whether zero-shot, few-shot, and finetuned models are able to identify creative expressions (a positive aspect of writing) and non-pragmatic ones (a negative aspect). Overall, frontier LLMs exhibit performance much higher than random but leave room for improvement, especially struggling to identify non-pragmatic expressions. We further find that LLM-as-a-Judge novelty scores from the best-performing model were predictive of expert writer preferences.

Related papers

LLMs Exhibit Significantly Lower Uncertainty in Creative Writing Than Professional Writers [1.9036571490366498]
We show that human writing consistently exhibits significantly higher uncertainty than model outputs.<n>We find that instruction-tuned and reasoning models exacerbate this trend compared to their base counterparts.
arXiv Detail & Related papers (2026-02-18T03:19:12Z)
Beyond Divergent Creativity: A Human-Based Evaluation of Creativity in Large Language Models [6.036586911740041]
Large language models (LLMs) are increasingly used in verbal creative tasks.<n>The widely used Divergent Association Task ( DAT) focuses on novelty, ignoring appropriateness.<n>We evaluate a range of state-of-the-art LLMs on DAT and show that their scores on the task are lower than those of two baselines that do not possess any creative abilities.
arXiv Detail & Related papers (2026-01-28T12:41:32Z)
CreativityPrism: A Holistic Benchmark for Large Language Model Creativity [64.18257552903151]
Creativity is often seen as a hallmark of human intelligence.<n>There is still no holistic framework to evaluate their creativity across diverse scenarios.<n>We propose CreativityPrism, an evaluation analysis framework that decomposes creativity into three dimensions: quality, novelty, and diversity.
arXiv Detail & Related papers (2025-10-23T00:22:10Z)
COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with Thought Processes [83.84578306665976]
Large language models exhibit systematic deficiencies in creative writing, particularly in non-English contexts.<n>We present COIG-Writer, a novel Chinese creative writing dataset that captures both diverse outputs and their underlying thought processes.
arXiv Detail & Related papers (2025-10-16T15:01:19Z)
Rethinking Creativity Evaluation: A Critical Analysis of Existing Creativity Evaluations [48.57816792550401]
We examine creativity measures including the creativity index, perplexity, syntactic templates, and LLM-as-a-Judge.<n>Our analyses reveal that these metrics exhibit limited consistency, capturing different dimensions of creativity.
arXiv Detail & Related papers (2025-08-07T15:11:48Z)
Cooking Up Creativity: A Cognitively-Inspired Approach for Enhancing LLM Creativity through Structured Representations [53.950760059792614]
Large Language Models (LLMs) excel at countless tasks, yet struggle with creativity.<n>We introduce a novel approach that couples LLMs with structured representations and cognitively inspired manipulations to generate more creative and diverse ideas.<n>We demonstrate our approach in the culinary domain with DishCOVER, a model that generates creative recipes.
arXiv Detail & Related papers (2025-04-29T11:13:06Z)
Beyond Memorization: Mapping the Originality-Quality Frontier of Language Models [19.700493685081604]
Large language models (LLMs) are increasingly used for ideation and scientific discovery.<n>Prior work evaluates novelty as the originality with respect to training data, but original outputs can be low quality.<n>We propose a new novelty metric for LLM generations that balances originality and quality.
arXiv Detail & Related papers (2025-04-13T00:48:58Z)
A Causality-aware Paradigm for Evaluating Creativity of Multimodal Large Language Models [100.16387798660833]
Oogiri game is a creativity-driven task requiring humor and associative thinking.<n>LoTbench is an interactive, causality-aware evaluation framework.<n>Results show that while most LLMs exhibit constrained creativity, the performance gap between LLMs and humans is not insurmountable.
arXiv Detail & Related papers (2025-01-25T09:11:15Z)
Evaluating Creative Short Story Generation in Humans and Large Language Models [0.7965327033045846]
Large language models (LLMs) have demonstrated the ability to generate high-quality stories.<n>We conduct a systematic analysis of creativity in short story generation across 60 LLMs and 60 people.
arXiv Detail & Related papers (2024-11-04T17:40:39Z)
AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text [53.15652021126663]
We present CREATIVITY INDEX as the first step to quantify the linguistic creativity of a text.<n>To compute CREATIVITY INDEX efficiently, we introduce DJ SEARCH, a novel dynamic programming algorithm.<n>Experiments reveal that the CREATIVITY INDEX of professional human authors is on average 66.2% higher than that of LLMs.
arXiv Detail & Related papers (2024-10-05T18:55:01Z)
Characterising the Creative Process in Humans and Large Language Models [6.363158395541767]
We provide an automated method to characterise how humans and LLMs explore semantic spaces on the Alternate Uses Task. We use sentence embeddings to identify response categories and compute semantic similarities, which we use to generate jump profiles. Our results corroborate earlier work in humans reporting both persistent (deep search in few semantic spaces) and flexible (broad search across multiple semantic spaces) pathways to creativity. Though LLMs as a population match human profiles, their relationship with creativity is different, where the more flexible models score higher on creativity.
arXiv Detail & Related papers (2024-05-01T23:06:46Z)
SciMON: Scientific Inspiration Machines Optimized for Novelty [68.46036589035539]
We explore and enhance the ability of neural language models to generate novel scientific directions grounded in literature. We take a dramatic departure with a novel setting in which models use as input background contexts. We present SciMON, a modeling framework that uses retrieval of "inspirations" from past scientific papers.
arXiv Detail & Related papers (2023-05-23T17:12:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.