Fugu-MT 論文翻訳(概要): CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges

論文の概要: CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges

arxiv url: http://arxiv.org/abs/2603.11863v1
Date: Thu, 12 Mar 2026 12:36:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-13 14:46:26.071536
Title: CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges
Title（参考訳）: CreativeBench: 自己進化的チャレンジによるマシンクリエイティビティのベンチマークと改善
Authors: Zi-Han Wang, Lam Nguyen, Zhengyang Zhao, Mengyue Yang, Chengwei Qin, Yujiu Yang, Linyi Yang,
Abstract要約: コード生成における機械の創造性を評価するベンチマークであるCreativeBenchを紹介します。 CreativeBenchは、創造性と幻覚を、品質とノベルティの産物として定義された統一された計量によって客観的に区別する。進化的探索パターンを内部化し,機械の創造性を継続的に向上する,プラグアンドプレイ型推論時ステアリング戦略であるEvoRePEを提案する。
参考スコア（独自算出の注目度）: 69.3795501613098
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The saturation of high-quality pre-training data has shifted research focus toward evolutionary systems capable of continuously generating novel artifacts, leading to the success of AlphaEvolve. However, the progress of such systems is hindered by the lack of rigorous, quantitative evaluation. To tackle this challenge, we introduce CreativeBench, a benchmark for evaluating machine creativity in code generation, grounded in a classical cognitive framework. Comprising two subsets -- CreativeBench-Combo and CreativeBench-Explore -- the benchmark targets combinatorial and exploratory creativity through an automated pipeline utilizing reverse engineering and self-play. By leveraging executable code, CreativeBench objectively distinguishes creativity from hallucination via a unified metric defined as the product of quality and novelty. Our analysis of state-of-the-art models reveals distinct behaviors: (1) scaling significantly improves combinatorial creativity but yields diminishing returns for exploration; (2) larger models exhibit ``convergence-by-scaling,'' becoming more correct but less divergent; and (3) reasoning capabilities primarily benefit constrained exploration rather than combination. Finally, we propose EvoRePE, a plug-and-play inference-time steering strategy that internalizes evolutionary search patterns to consistently enhance machine creativity.
Abstract（参考訳）: 高品質な事前学習データの飽和は、新しいアーティファクトを継続的に生成できる進化システムに研究の焦点を移し、AlphaEvolveの成功に繋がった。しかし、このようなシステムの進歩は厳密で定量的な評価の欠如によって妨げられている。この課題に対処するために、古典的な認知フレームワークを基盤とした、コード生成における機械の創造性を評価するためのベンチマークであるCreativeBenchを紹介します。 CreativeBench-ComboとCreativeBench-Exploreという2つのサブセットで構成されているこのベンチマークは、リバースエンジニアリングとセルフプレイを活用する自動パイプラインを通じて、組合せ的および探索的創造性を目標としている。実行可能なコードを活用することにより、CreativeBenchは創造性と幻覚を、品質とノベルティの産物として定義された統一された計量を通じて客観的に区別する。 1)スケーリングは組合せの創造性を著しく改善するが、探索のリターンを減少させる; (2)大規模モデルは「収束・バイ・スケーリング」を示す; より正確だが分岐しにくい; 3) 推論能力は組み合わせよりも主に制約された探索に寄与する。最後に,進化的探索パターンを内部化して機械の創造性を継続的に向上する,プラグアンドプレイ型推論時ステアリング戦略であるEvoRePEを提案する。

論文の概要: CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges

関連論文リスト