Combinatorial Creativity: A New Frontier in Generalization Abilities
- URL: http://arxiv.org/abs/2509.21043v4
- Date: Mon, 03 Nov 2025 15:40:02 GMT
- Title: Combinatorial Creativity: A New Frontier in Generalization Abilities
- Authors: Samuel Schapiro, Sumuk Shashidhar, Alexi Gladstone, Jonah Black, Royce Moon, Dilek Hakkani-Tur, Lav R. Varshney,
- Abstract summary: We study the scaling behavior of creativity for Large Language Models (LLMs)<n>We find that for fixed compute budgets, there exist optimal model depths and widths for creative ability.<n>We find that the ideation-execution gap, whereby LLMs excel at generating novel scientific ideas but struggle to ensure their practical feasibility, may be explained by a fundamental novelty-utility tradeoff characteristic of creativity algorithms in general.
- Score: 14.121904952399975
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Artificial intelligence (AI) systems, and Large Language Models (LLMs) in particular, are increasingly employed for creative tasks like scientific idea generation, constituting a form of generalization from training data unaddressed by existing conceptual frameworks. Despite its similarities to compositional generalization (CG), combinatorial creativity (CC) is an open-ended ability. Instead of evaluating for accuracy or correctness against fixed targets, which would contradict the open-ended nature of CC, we propose a theoretical framework and algorithmic task for evaluating outputs by their degrees of novelty and utility. From here, we make several important empirical contributions: (1) We obtain the first insights into the scaling behavior of creativity for LLMs. (2) We discover that, for fixed compute budgets, there exist optimal model depths and widths for creative ability. (3) We find that the ideation-execution gap, whereby LLMs excel at generating novel scientific ideas but struggle to ensure their practical feasibility, may be explained by a more fundamental novelty-utility tradeoff characteristic of creativity algorithms in general. Importantly, this tradeoff remains persistent even at scale, casting doubt on the long-term creative potential of LLMs in their current form. Together, our conceptual framework and empirical findings provide a foundation for understanding and improving creativity in modern AI models, bridging the gap between human and machine intelligence.
Related papers
- GIFT: Games as Informal Training for Generalizable LLMs [64.47890325824763]
Large Language Models (LLMs) struggle with "practical wisdom" and generalizable intelligence.<n>This gap arises from a lack of informal learning, which thrives on interactive feedback rather than goal-oriented instruction.<n>We propose treating Games as a primary environment for LLM informal learning, leveraging their intrinsic reward signals and abstracted complexity.
arXiv Detail & Related papers (2026-01-09T08:42:44Z) - CreativityPrism: A Holistic Benchmark for Large Language Model Creativity [64.18257552903151]
Creativity is often seen as a hallmark of human intelligence.<n>There is still no holistic framework to evaluate their creativity across diverse scenarios.<n>We propose CreativityPrism, an evaluation analysis framework that decomposes creativity into three dimensions: quality, novelty, and diversity.
arXiv Detail & Related papers (2025-10-23T00:22:10Z) - Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark [69.8473923357969]
Unified multimodal models aim to jointly enable visual understanding and generation, yet current benchmarks rarely examine their true integration.<n>We present Uni-MMMU, a comprehensive benchmark that unfolds the bidirectional synergy between generation and understanding across eight reasoning-centric domains.
arXiv Detail & Related papers (2025-10-15T17:10:35Z) - What Shapes a Creative Machine Mind? Comprehensively Benchmarking Creativity in Foundation Models [16.81217474424392]
We introduce C2-Eval, a holistic benchmark for unified assessment of creativity in foundation models (FMs)<n>C2-Eval distinguishes between two complementary forms of creativity: convergent creativity, where tasks admit constrained solutions, and divergent creativity, where tasks are open-ended.<n>Our results show that C2-Eval is an effective lens for examining the evolving landscape of creative AI.
arXiv Detail & Related papers (2025-10-05T03:00:50Z) - Large Language Models as Innovators: A Framework to Leverage Latent Space Exploration for Novelty Discovery [19.394116388173885]
Large language models (LLMs) often struggle to produce outputs that are both novel and relevant.<n>We propose a model-agnostic latent-space ideation framework that enables controlled, scalable creativity.
arXiv Detail & Related papers (2025-07-18T12:54:28Z) - Cooking Up Creativity: A Cognitively-Inspired Approach for Enhancing LLM Creativity through Structured Representations [53.950760059792614]
Large Language Models (LLMs) excel at countless tasks, yet struggle with creativity.<n>We introduce a novel approach that couples LLMs with structured representations and cognitively inspired manipulations to generate more creative and diverse ideas.<n>We demonstrate our approach in the culinary domain with DishCOVER, a model that generates creative recipes.
arXiv Detail & Related papers (2025-04-29T11:13:06Z) - Probing and Inducing Combinational Creativity in Vision-Language Models [52.76981145923602]
Recent advances in Vision-Language Models (VLMs) have sparked debate about whether their outputs reflect combinational creativity.<n>We propose the Identification-Explanation-Implication (IEI) framework, which decomposes creative processes into three levels.<n>To validate this framework, we curate CreativeMashup, a high-quality dataset of 666 artist-generated visual mashups annotated according to the IEI framework.
arXiv Detail & Related papers (2025-04-17T17:38:18Z) - LiveIdeaBench: Evaluating LLMs' Divergent Thinking for Scientific Idea Generation with Minimal Context [13.967898012303325]
We introduce LiveIdeaBench, a benchmark evaluating Large Language Models' scientific idea generation.<n>Our benchmark employs a dynamic panel of state-of-the-art LLMs to assess generated ideas across five key dimensions: originality, feasibility, fluency, flexibility, and clarity.<n>Our results demonstrate that models like QwQ-32B-preview achieve creative performance comparable to top-tier models such as claude-3.7-sonnet:thinking, despite significant gaps in their general intelligence scores.
arXiv Detail & Related papers (2024-12-23T14:13:44Z) - LLMs can Realize Combinatorial Creativity: Generating Creative Ideas via LLMs for Scientific Research [5.564972490390789]
We present a framework that explicitly implements creativity theory using Large Language Models (LLMs)<n>The framework features a generalization-level retrieval system for cross-domain knowledge discovery and a structured process for idea generation.<n>Experiments on the OAG-Bench dataset demonstrate our framework's effectiveness, consistently outperforming baseline approaches in generating ideas that align with real research developments.
arXiv Detail & Related papers (2024-12-18T18:41:14Z) - Benchmarking Language Model Creativity: A Case Study on Code Generation [39.546827184857754]
In this work, we introduce a framework for quantifying LLM creativity.<n>We define NEOGAUGE, a metric that quantifies both convergent and divergent thinking in the generated creative responses.<n>We test the proposed framework on Codeforces problems, which serve as both a natural dataset for coding tasks and a collection of prior human solutions.
arXiv Detail & Related papers (2024-07-12T05:55:22Z) - Coding for Intelligence from the Perspective of Category [66.14012258680992]
Coding targets compressing and reconstructing data, and intelligence.
Recent trends demonstrate the potential homogeneity of these two fields.
We propose a novel problem of Coding for Intelligence from the category theory view.
arXiv Detail & Related papers (2024-07-01T07:05:44Z) - Creativity and Markov Decision Processes [0.20482269513546453]
We identify formal mappings between Boden's process theory of creativity and Markov Decision Processes (MDPs)
We study three out of eleven mappings in detail to understand which types of creative processes, opportunities foraberrations, and threats to creativity (uninspiration) could be observed in an MDP.
We conclude by discussing quality criteria for the selection of such mappings for future work and applications.
arXiv Detail & Related papers (2024-05-23T18:16:42Z) - Can AI Be as Creative as Humans? [84.43873277557852]
We prove in theory that AI can be as creative as humans under the condition that it can properly fit the data generated by human creators.
The debate on AI's creativity is reduced into the question of its ability to fit a sufficient amount of data.
arXiv Detail & Related papers (2024-01-03T08:49:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.