Probing and Inducing Combinational Creativity in Vision-Language Models
- URL: http://arxiv.org/abs/2504.13120v2
- Date: Tue, 29 Apr 2025 14:51:47 GMT
- Title: Probing and Inducing Combinational Creativity in Vision-Language Models
- Authors: Yongqian Peng, Yuxi Ma, Mengmeng Wang, Yuxuan Wang, Yizhou Wang, Chi Zhang, Yixin Zhu, Zilong Zheng,
- Abstract summary: Recent advances in Vision-Language Models (VLMs) have sparked debate about whether their outputs reflect combinational creativity.<n>We propose the Identification-Explanation-Implication (IEI) framework, which decomposes creative processes into three levels.<n>To validate this framework, we curate CreativeMashup, a high-quality dataset of 666 artist-generated visual mashups annotated according to the IEI framework.
- Score: 52.76981145923602
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to combine existing concepts into novel ideas stands as a fundamental hallmark of human intelligence. Recent advances in Vision-Language Models (VLMs) like GPT-4V and DALLE-3 have sparked debate about whether their outputs reflect combinational creativity--defined by M. A. Boden (1998) as synthesizing novel ideas through combining existing concepts--or sophisticated pattern matching of training data. Drawing inspiration from cognitive science, we investigate the combinational creativity of VLMs from the lens of concept blending. We propose the Identification-Explanation-Implication (IEI) framework, which decomposes creative processes into three levels: identifying input spaces, extracting shared attributes, and deriving novel semantic implications. To validate this framework, we curate CreativeMashup, a high-quality dataset of 666 artist-generated visual mashups annotated according to the IEI framework. Through extensive experiments, we demonstrate that in comprehension tasks, best VLMs have surpassed average human performance while falling short of expert-level understanding; in generation tasks, incorporating our IEI framework into the generation pipeline significantly enhances the creative quality of VLMs' outputs. Our findings establish both a theoretical foundation for evaluating artificial creativity and practical guidelines for improving creative generation in VLMs.
Related papers
- Training Emergent Joint Associations: A Reinforcement Learning Approach to Creative Thinking in Language Models [9.943285575387849]
Associative thinking is a foundational element of human creativity and problem-solving.<n>This paper explores whether reinforcement learning guided by associative thinking principles can enhance a model's performance across diverse generative tasks.
arXiv Detail & Related papers (2025-11-22T02:10:27Z) - CreativityPrism: A Holistic Benchmark for Large Language Model Creativity [64.18257552903151]
Creativity is often seen as a hallmark of human intelligence.<n>There is still no holistic framework to evaluate their creativity across diverse scenarios.<n>We propose CreativityPrism, an evaluation analysis framework that decomposes creativity into three dimensions: quality, novelty, and diversity.
arXiv Detail & Related papers (2025-10-23T00:22:10Z) - VLM-Guided Adaptive Negative Prompting for Creative Generation [21.534474554320823]
Creative generation is the synthesis of new, surprising, and valuable samples that reflect user intent yet cannot be envisioned in advance.<n>We propose VLM-Guided Adaptive Negative-Prompting, a training-free, inference-time method that promotes creative image generation.<n>We show consistent gains in creative novelty with negligible computational overhead.
arXiv Detail & Related papers (2025-10-12T17:34:59Z) - Combinatorial Creativity: A New Frontier in Generalization Abilities [14.121904952399975]
We study the scaling behavior of creativity for Large Language Models (LLMs)<n>We find that for fixed compute budgets, there exist optimal model depths and widths for creative ability.<n>We find that the ideation-execution gap, whereby LLMs excel at generating novel scientific ideas but struggle to ensure their practical feasibility, may be explained by a fundamental novelty-utility tradeoff characteristic of creativity algorithms in general.
arXiv Detail & Related papers (2025-09-25T11:48:37Z) - Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation [50.22361866757033]
unified vision-language models (VLMs) integrate both visual understanding and generation capabilities.<n>This paper systematically investigates the generalization across understanding and generation tasks in unifiedVLMs.
arXiv Detail & Related papers (2025-05-29T03:40:21Z) - Creativity in LLM-based Multi-Agent Systems: A Survey [56.25583236738877]
Large language model (LLM)-driven multi-agent systems (MAS) are transforming how humans and AIs collaboratively generate ideas and artifacts.<n>This is the first survey dedicated to creativity in MAS.<n>We focus on text and image generation tasks, and present: (1) a taxonomy of agent proactivity and persona design; (2) an overview of generation techniques, including divergent exploration, iterative refinement, and collaborative synthesis, as well as relevant datasets and evaluation metrics; and (3) a discussion of key challenges, such as inconsistent evaluation standards, insufficient bias mitigation, coordination conflicts, and the lack of unified benchmarks.
arXiv Detail & Related papers (2025-05-27T12:36:14Z) - Cooking Up Creativity: A Cognitively-Inspired Approach for Enhancing LLM Creativity through Structured Representations [53.950760059792614]
Large Language Models (LLMs) excel at countless tasks, yet struggle with creativity.
We introduce a novel approach that couples LLMs with structured representations and cognitively inspired manipulations to generate more creative and diverse ideas.
We demonstrate our approach in the culinary domain with DishCOVER, a model that generates creative recipes.
arXiv Detail & Related papers (2025-04-29T11:13:06Z) - Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM [58.42678619252968]
Creation-MMBench is a benchmark designed to evaluate the creative capabilities of Multimodal Large Language Models.<n>The benchmark comprises 765 test cases spanning 51 fine-grained tasks.<n> Experimental results reveal that open-source MLLMs significantly underperform compared to proprietary models in creative tasks.
arXiv Detail & Related papers (2025-03-18T17:51:34Z) - A Causality-aware Paradigm for Evaluating Creativity of Multimodal Large Language Models [100.16387798660833]
Oogiri game is a creativity-driven task requiring humor and associative thinking.<n>LoTbench is an interactive, causality-aware evaluation framework.<n>Results show that while most LLMs exhibit constrained creativity, the performance gap between LLMs and humans is not insurmountable.
arXiv Detail & Related papers (2025-01-25T09:11:15Z) - LLMs can Realize Combinatorial Creativity: Generating Creative Ideas via LLMs for Scientific Research [5.564972490390789]
We present a framework that explicitly implements creativity theory using Large Language Models (LLMs)<n>The framework features a generalization-level retrieval system for cross-domain knowledge discovery and a structured process for idea generation.<n>Experiments on the OAG-Bench dataset demonstrate our framework's effectiveness, consistently outperforming baseline approaches in generating ideas that align with real research developments.
arXiv Detail & Related papers (2024-12-18T18:41:14Z) - VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks [48.67062958311173]
VL-GLUE is a multitask benchmark for natural language understanding.
We show that this benchmark is quite challenging for existing large-scale vision-language models.
arXiv Detail & Related papers (2024-10-17T15:27:17Z) - MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models [85.10375181040436]
We propose MMCOMPOSITION, a novel human-annotated benchmark for comprehensively and accurately evaluating Vision-Language Models.
We find GPT-4o's compositionality inferior to the best open-source model, and we analyze the underlying reasons.
arXiv Detail & Related papers (2024-10-13T05:35:09Z) - Creativity Has Left the Chat: The Price of Debiasing Language Models [1.223779595809275]
We investigate the unintended consequences of Reinforcement Learning from Human Feedback on the creativity of Large Language Models (LLMs)
Our findings have significant implications for marketers who rely on LLMs for creative tasks such as copywriting, ad creation, and customer persona generation.
arXiv Detail & Related papers (2024-06-08T22:14:51Z) - Divergent Creativity in Humans and Large Language Models [37.67363469600804]
The recent surge in the capabilities of Large Language Models has led to claims that they are approaching a level of creativity akin to human capabilities.
We leverage recent advances in creativity science to build a framework for in-depth analysis of divergent creativity in both state-of-the-art LLMs and a substantial dataset of 100,000 humans.
arXiv Detail & Related papers (2024-05-13T22:37:52Z) - An Artificial Intelligence Approach for Interpreting Creative Combinational Designs [1.3948357001626264]
Combinational creativity is a form of creativity involving the blending of familiar ideas.
This study focuses on the computational interpretation, specifically identifying the 'base' and 'additive' components that constitute a creative design.
arXiv Detail & Related papers (2024-05-08T11:47:32Z) - Can AI Be as Creative as Humans? [84.43873277557852]
We prove in theory that AI can be as creative as humans under the condition that it can properly fit the data generated by human creators.
The debate on AI's creativity is reduced into the question of its ability to fit a sufficient amount of data.
arXiv Detail & Related papers (2024-01-03T08:49:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.