Fugu-MT 論文翻訳(概要): CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge

論文の概要: CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge

arxiv url: http://arxiv.org/abs/2604.03374v1
Date: Fri, 03 Apr 2026 18:08:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-07 15:49:18.542871
Title: CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge
Title（参考訳）: CresOWLve: 現実の知識に関する創造的な問題解決のベンチマーク
Authors: Mete Ismayilzada, Renqing Cuomao, Daniil Yurshevich, Anna Sotnikova, Lonneke van der Plas, Antoine Bosselut,
Abstract要約: 実世界の知識に根ざしたパズルを用いて,創造的な問題解決を評価するためのベンチマークであるCresOWLveを紹介する。 CresOWLveの問題は、複数の創造的思考戦略を採用し、さまざまなドメインから事実を抽出し、ソリューションに到達するためにそれらを創造的に組み合わせることである。モデルは、創造的なものよりも、現実的な質問において、はるかに優れたパフォーマンスを発揮する。
参考スコア（独自算出の注目度）: 19.526111468269892
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Creative problem-solving requires combining multiple cognitive abilities, including logical reasoning, lateral thinking, analogy-making, and commonsense knowledge, to discover insights that connect seemingly unrelated pieces of information. However, most existing benchmarks for large language models (LLMs) evaluate only specific components of this process. Moreover, many creativity-oriented benchmarks rely on artificially constructed brainteasers or contrived scenarios that do not reflect how creative problem-solving occurs in real-world settings. To address this gap, we introduce CresOWLve, a benchmark for evaluating creative problem-solving using puzzles grounded in real-world knowledge. Problems in CresOWLve require employing multiple creative thinking strategies, retrieving facts from diverse domains, and creatively combining them to arrive at a solution. Evaluating several frontier non-thinking and thinking LLMs, we show that CresOWLve remains highly challenging. Our analysis reveals a consistent performance gap: models perform substantially better on factual questions than on creative ones (up to a -17% drop). While models can often retrieve the relevant knowledge, they struggle to form the non-obvious creative connections required to integrate this information and arrive at the correct answer.
Abstract（参考訳）: 創造的な問題解決には、論理的推論、横方向の思考、類推、常識的知識を含む複数の認知能力を組み合わせることが必要である。しかし、既存の大規模言語モデル(LLM)のベンチマークのほとんどは、このプロセスの特定のコンポーネントのみを評価している。さらに、多くのクリエイティビティ指向のベンチマークは、現実の環境で創造的な問題解決がどのように起こるのかを反映しない、人工的に構築されたブレインテザやコントリビュートシナリオに依存している。このギャップに対処するために,現実世界の知識に根ざしたパズルを用いて,創造的な問題解決を評価するためのベンチマークであるCresOWLveを紹介した。 CresOWLveの問題は、複数の創造的思考戦略を採用し、さまざまなドメインから事実を抽出し、ソリューションに到達するためにそれらを創造的に組み合わせることである。いくつかの未考・思考のフロンティアを評価した結果,CresOWLveは依然として極めて困難であることがわかった。モデルは、創造的(最大17%の減少)よりも、現実的な質問で大幅にパフォーマンスが向上します。モデルは、しばしば関連する知識を回収するが、これらの情報を統合して正しい答えに到達するのに必要な、忘れられない創造的なつながりを形成するのに苦労する。

論文の概要: CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge

関連論文リスト