Related papers: What Shapes a Creative Machine Mind? Comprehensively Benchmarking Creativity in Foundation Models

What Shapes a Creative Machine Mind? Comprehensively Benchmarking Creativity in Foundation Models

URL: http://arxiv.org/abs/2510.04009v1
Date: Sun, 05 Oct 2025 03:00:50 GMT
Title: What Shapes a Creative Machine Mind? Comprehensively Benchmarking Creativity in Foundation Models
Authors: Zicong He, Boxuan Zhang, Weihao Liu, Ruixiang Tang, Lu Cheng,
Abstract summary: We introduce C2-Eval, a holistic benchmark for unified assessment of creativity in foundation models (FMs)<n>C2-Eval distinguishes between two complementary forms of creativity: convergent creativity, where tasks admit constrained solutions, and divergent creativity, where tasks are open-ended.<n>Our results show that C2-Eval is an effective lens for examining the evolving landscape of creative AI.
Score: 16.81217474424392
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The meteoric rise of foundation models (FMs) has expanded their capabilities far beyond conventional tasks. Creativity, long regarded as a hallmark of human intelligence and a driver of innovation, is now increasingly recognized as a critical dimension of machine intelligence in the era of generative FMs, complementing traditional measures of accuracy. However, existing evaluation frameworks for creativity remain fragmented, relying on ad hoc metrics not firmly grounded in established theories. To address this gap, we introduce C^2-Eval, a holistic benchmark for unified assessment of creativity in FMs. C^2-Eval distinguishes between two complementary forms of creativity: convergent creativity, where tasks admit constrained solutions (e.g., code generation), and divergent creativity, where tasks are open-ended (e.g., storytelling). It evaluates both dimensions using fine-grained criteria derived from social-science theory, focusing on Usefulness, Originality, and Surprise (U-O-S). Through extensive experiments on leading proprietary and open-source models, we analyze trade-offs in their creative capabilities. Our results highlight both the strengths and challenges of current FMs in pursuing a creative machine mind, showing that C^2-Eval is an effective lens for examining the evolving landscape of creative AI.

Related papers

Beyond Divergent Creativity: A Human-Based Evaluation of Creativity in Large Language Models [6.036586911740041]
Large language models (LLMs) are increasingly used in verbal creative tasks.<n>The widely used Divergent Association Task ( DAT) focuses on novelty, ignoring appropriateness.<n>We evaluate a range of state-of-the-art LLMs on DAT and show that their scores on the task are lower than those of two baselines that do not possess any creative abilities.
arXiv Detail & Related papers (2026-01-28T12:41:32Z)
CreativityPrism: A Holistic Benchmark for Large Language Model Creativity [64.18257552903151]
Creativity is often seen as a hallmark of human intelligence.<n>There is still no holistic framework to evaluate their creativity across diverse scenarios.<n>We propose CreativityPrism, an evaluation analysis framework that decomposes creativity into three dimensions: quality, novelty, and diversity.
arXiv Detail & Related papers (2025-10-23T00:22:10Z)
Combinatorial Creativity: A New Frontier in Generalization Abilities [14.121904952399975]
We study the scaling behavior of creativity for Large Language Models (LLMs)<n>We find that for fixed compute budgets, there exist optimal model depths and widths for creative ability.<n>We find that the ideation-execution gap, whereby LLMs excel at generating novel scientific ideas but struggle to ensure their practical feasibility, may be explained by a fundamental novelty-utility tradeoff characteristic of creativity algorithms in general.
arXiv Detail & Related papers (2025-09-25T11:48:37Z)
Rethinking Creativity Evaluation: A Critical Analysis of Existing Creativity Evaluations [48.57816792550401]
We examine creativity measures including the creativity index, perplexity, syntactic templates, and LLM-as-a-Judge.<n>Our analyses reveal that these metrics exhibit limited consistency, capturing different dimensions of creativity.
arXiv Detail & Related papers (2025-08-07T15:11:48Z)
Creativity in LLM-based Multi-Agent Systems: A Survey [56.25583236738877]
Large language model (LLM)-driven multi-agent systems (MAS) are transforming how humans and AIs collaboratively generate ideas and artifacts.<n>This is the first survey dedicated to creativity in MAS.<n>We focus on text and image generation tasks, and present: (1) a taxonomy of agent proactivity and persona design; (2) an overview of generation techniques, including divergent exploration, iterative refinement, and collaborative synthesis, as well as relevant datasets and evaluation metrics; and (3) a discussion of key challenges, such as inconsistent evaluation standards, insufficient bias mitigation, coordination conflicts, and the lack of unified benchmarks.
arXiv Detail & Related papers (2025-05-27T12:36:14Z)
Probing and Inducing Combinational Creativity in Vision-Language Models [52.76981145923602]
Recent advances in Vision-Language Models (VLMs) have sparked debate about whether their outputs reflect combinational creativity.<n>We propose the Identification-Explanation-Implication (IEI) framework, which decomposes creative processes into three levels.<n>To validate this framework, we curate CreativeMashup, a high-quality dataset of 666 artist-generated visual mashups annotated according to the IEI framework.
arXiv Detail & Related papers (2025-04-17T17:38:18Z)
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM [58.42678619252968]
Creation-MMBench is a benchmark designed to evaluate the creative capabilities of Multimodal Large Language Models.<n>The benchmark comprises 765 test cases spanning 51 fine-grained tasks.<n> Experimental results reveal that open-source MLLMs significantly underperform compared to proprietary models in creative tasks.
arXiv Detail & Related papers (2025-03-18T17:51:34Z)
Creativity and Markov Decision Processes [0.20482269513546453]
We identify formal mappings between Boden's process theory of creativity and Markov Decision Processes (MDPs) We study three out of eleven mappings in detail to understand which types of creative processes, opportunities foraberrations, and threats to creativity (uninspiration) could be observed in an MDP. We conclude by discussing quality criteria for the selection of such mappings for future work and applications.
arXiv Detail & Related papers (2024-05-23T18:16:42Z)
Can AI Be as Creative as Humans? [84.43873277557852]
We prove in theory that AI can be as creative as humans under the condition that it can properly fit the data generated by human creators. The debate on AI's creativity is reduced into the question of its ability to fit a sufficient amount of data.
arXiv Detail & Related papers (2024-01-03T08:49:12Z)
Automatic Creativity Measurement in Scratch Programs Across Modalities [6.242018846706069]
We make the journey fromdefining a formal measure of creativity that is efficientlycomputable to applying the measure in a practical domain. We adapted the general measure for projects in the popular visual programming language Scratch. We designed a machine learning model for predicting the creativity of Scratch projects, trained and evaluated on human expert creativity assessments.
arXiv Detail & Related papers (2022-11-07T10:43:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.