Fugu-MT 論文翻訳(概要): What Shapes a Creative Machine Mind? Comprehensively Benchmarking Creativity in Foundation Models

論文の概要: What Shapes a Creative Machine Mind? Comprehensively Benchmarking Creativity in Foundation Models

arxiv url: http://arxiv.org/abs/2510.04009v1
Date: Sun, 05 Oct 2025 03:00:50 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-07 16:52:59.392158
Title: What Shapes a Creative Machine Mind? Comprehensively Benchmarking Creativity in Foundation Models
Title（参考訳）: 創造的マシンマインドとは何か? 基礎モデルにおける創造性を総合的にベンチマークする
Authors: Zicong He, Boxuan Zhang, Weihao Liu, Ruixiang Tang, Lu Cheng,
Abstract要約: 基礎モデル(FM)における創造性の統一評価のための総合的なベンチマークであるC2-Evalを紹介する。 C2-Evalは2つの相補的な創造形態を区別している。以上の結果から,C2-Evalは創造的AIの進化する景観を調べる上で有効なレンズであることがわかった。
参考スコア（独自算出の注目度）: 16.81217474424392
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The meteoric rise of foundation models (FMs) has expanded their capabilities far beyond conventional tasks. Creativity, long regarded as a hallmark of human intelligence and a driver of innovation, is now increasingly recognized as a critical dimension of machine intelligence in the era of generative FMs, complementing traditional measures of accuracy. However, existing evaluation frameworks for creativity remain fragmented, relying on ad hoc metrics not firmly grounded in established theories. To address this gap, we introduce C^2-Eval, a holistic benchmark for unified assessment of creativity in FMs. C^2-Eval distinguishes between two complementary forms of creativity: convergent creativity, where tasks admit constrained solutions (e.g., code generation), and divergent creativity, where tasks are open-ended (e.g., storytelling). It evaluates both dimensions using fine-grained criteria derived from social-science theory, focusing on Usefulness, Originality, and Surprise (U-O-S). Through extensive experiments on leading proprietary and open-source models, we analyze trade-offs in their creative capabilities. Our results highlight both the strengths and challenges of current FMs in pursuing a creative machine mind, showing that C^2-Eval is an effective lens for examining the evolving landscape of creative AI.
Abstract（参考訳）: ファンデーションモデル(FM)の気象学的上昇は、従来の課題を超えてその能力を拡大した。創造性は、長年人間の知能の目印とされ、イノベーションの原動力とされてきたが、現在では、従来の精度の尺度を補完する生成的FMの時代において、マシンインテリジェンスの重要な次元として認識されている。しかし、既存の創造性評価フレームワークは、確立された理論にしっかりと根ざしていないアドホックな指標に頼って、断片化されているままである。 C^2-EvalはFMの創造性を総合的に評価するための総合的なベンチマークである。 C^2-Evalは2つの相補的な創造形態を区別する: 収束的創造性、タスクが制約されたソリューション(例えば、コード生成)を受け入れること、そしてタスクがオープンな(例えば、ストーリーテリング)創造性である。社会科学理論から派生したきめ細かい基準を用いて両次元を評価し、有用性、独創性、およびサプライズ(U-O-S)に焦点を当てる。プロプライエタリモデルとオープンソースモデルをリードする広範な実験を通じて、創造性におけるトレードオフを分析します。この結果から,C^2-Evalが創造的AIの進化する風景を観察するための有効なレンズであることを示す。

論文の概要: What Shapes a Creative Machine Mind? Comprehensively Benchmarking Creativity in Foundation Models

関連論文リスト