Related papers: Beyond Divergent Creativity: A Human-Based Evaluation of Creativity in Large Language Models

Beyond Divergent Creativity: A Human-Based Evaluation of Creativity in Large Language Models

URL: http://arxiv.org/abs/2601.20546v1
Date: Wed, 28 Jan 2026 12:41:32 GMT
Title: Beyond Divergent Creativity: A Human-Based Evaluation of Creativity in Large Language Models
Authors: Kumiko Nakajima, Jan Zuiderveld, Sandro Pezzelle,
Abstract summary: Large language models (LLMs) are increasingly used in verbal creative tasks.<n>The widely used Divergent Association Task ( DAT) focuses on novelty, ignoring appropriateness.<n>We evaluate a range of state-of-the-art LLMs on DAT and show that their scores on the task are lower than those of two baselines that do not possess any creative abilities.
Score: 6.036586911740041
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are increasingly used in verbal creative tasks. However, previous assessments of the creative capabilities of LLMs remain weakly grounded in human creativity theory and are thus hard to interpret. The widely used Divergent Association Task (DAT) focuses on novelty, ignoring appropriateness, a core component of creativity. We evaluate a range of state-of-the-art LLMs on DAT and show that their scores on the task are lower than those of two baselines that do not possess any creative abilities, undermining its validity for model evaluation. Grounded in human creativity theory, which defines creativity as the combination of novelty and appropriateness, we introduce Conditional Divergent Association Task (CDAT). CDAT evaluates novelty conditional on contextual appropriateness, separating noise from creativity better than DAT, while remaining simple and objective. Under CDAT, smaller model families often show the most creativity, whereas advanced families favor appropriateness at lower novelty. We hypothesize that training and alignment likely shift models along this frontier, making outputs more appropriate but less creative. We release the dataset and code.

Related papers

LLMs Exhibit Significantly Lower Uncertainty in Creative Writing Than Professional Writers [1.9036571490366498]
We show that human writing consistently exhibits significantly higher uncertainty than model outputs.<n>We find that instruction-tuned and reasoning models exacerbate this trend compared to their base counterparts.
arXiv Detail & Related papers (2026-02-18T03:19:12Z)
CreativityPrism: A Holistic Benchmark for Large Language Model Creativity [64.18257552903151]
Creativity is often seen as a hallmark of human intelligence.<n>There is still no holistic framework to evaluate their creativity across diverse scenarios.<n>We propose CreativityPrism, an evaluation analysis framework that decomposes creativity into three dimensions: quality, novelty, and diversity.
arXiv Detail & Related papers (2025-10-23T00:22:10Z)
Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity [29.58419742230708]
N-gram novelty is widely used to evaluate language models' ability to generate text outside of their training data.<n>We investigate the relationship between this notion of creativity and n-gram novelty through close reading of human and AI-generated text.<n>We find that while n-gram novelty is positively associated with expert writer-judged creativity, 91% of top-quartile expressions by n-gram novelty are not judged as creative.
arXiv Detail & Related papers (2025-09-26T17:59:05Z)
Rethinking Creativity Evaluation: A Critical Analysis of Existing Creativity Evaluations [48.57816792550401]
We examine creativity measures including the creativity index, perplexity, syntactic templates, and LLM-as-a-Judge.<n>Our analyses reveal that these metrics exhibit limited consistency, capturing different dimensions of creativity.
arXiv Detail & Related papers (2025-08-07T15:11:48Z)
Cooking Up Creativity: Enhancing LLM Creativity through Structured Recombination [46.79423188943526]
We introduce a novel approach that enhances Large Language Models (LLMs) creativity.<n>We apply LLMs for translating between natural language and structured representations, and perform the core creative leap.<n>We demonstrate our approach in the culinary domain with DishCOVER, a model that generates creative recipes.
arXiv Detail & Related papers (2025-04-29T11:13:06Z)
Probing and Inducing Combinational Creativity in Vision-Language Models [52.76981145923602]
Recent advances in Vision-Language Models (VLMs) have sparked debate about whether their outputs reflect combinational creativity.<n>We propose the Identification-Explanation-Implication (IEI) framework, which decomposes creative processes into three levels.<n>To validate this framework, we curate CreativeMashup, a high-quality dataset of 666 artist-generated visual mashups annotated according to the IEI framework.
arXiv Detail & Related papers (2025-04-17T17:38:18Z)
A Causality-aware Paradigm for Evaluating Creativity of Multimodal Large Language Models [100.16387798660833]
Oogiri game is a creativity-driven task requiring humor and associative thinking.<n>LoTbench is an interactive, causality-aware evaluation framework.<n>Results show that while most LLMs exhibit constrained creativity, the performance gap between LLMs and humans is not insurmountable.
arXiv Detail & Related papers (2025-01-25T09:11:15Z)
Steering Large Language Models to Evaluate and Amplify Creativity [7.031631627161492]
We show that we can leverage this knowledge of how to write creatively in order to better judge what is creative.<n>We take a mechanistic approach that extracts differences in the internal states of an LLM when prompted to respond "boringly" or "creatively"
arXiv Detail & Related papers (2024-12-08T20:28:48Z)
Creativity Has Left the Chat: The Price of Debiasing Language Models [1.223779595809275]
We investigate the unintended consequences of Reinforcement Learning from Human Feedback on the creativity of Large Language Models (LLMs) Our findings have significant implications for marketers who rely on LLMs for creative tasks such as copywriting, ad creation, and customer persona generation.
arXiv Detail & Related papers (2024-06-08T22:14:51Z)
Divergent Creativity in Humans and Large Language Models [37.67363469600804]
Large Language Models (LLMs) have led to claims that they are approaching a level of creativity akin to human capabilities.<n>We leverage recent advances in computational creativity to analyze semantic divergence in both state-of-the-art LLMs and a dataset of 100,000 humans.<n>We found evidence that LLMs can surpass average human performance on the Divergent Association Task, and approach human creative writing abilities.
arXiv Detail & Related papers (2024-05-13T22:37:52Z)
Can AI Be as Creative as Humans? [84.43873277557852]
We prove in theory that AI can be as creative as humans under the condition that it can properly fit the data generated by human creators. The debate on AI's creativity is reduced into the question of its ability to fit a sufficient amount of data.
arXiv Detail & Related papers (2024-01-03T08:49:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.