Fugu-MT 論文翻訳(概要): Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models

論文の概要: Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models

arxiv url: http://arxiv.org/abs/2505.10844v1
Date: Fri, 16 May 2025 04:23:34 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-19 14:36:14.067852
Title: Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models
Title（参考訳）: 創造性かブルート力か? 大規模言語モデルにおける問題解決能力の窓口としてブレインテザを用いた検討
Authors: Simeng Han, Stephen Xia, Grant Zhang, Howard Dai, Chen Liu, Lichang Chen, Hoang Huy Nguyen, Hongyuan Mei, Jiayuan Mao, R. Thomas McCoy,
Abstract要約: 我々は、モデルが使用する推論戦略のタイプをより深く調査するために、長い物語形式で書かれたブレインティーザに基づくベンチマークを導入する。ブレインティーザは、創造的な洞察を使った数ステップのソリューションや、より残酷な力を使ったより長いソリューションなど、複数のアプローチで解決することができる。
参考スコア（独自算出の注目度）: 28.791905315055974
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accuracy remains a standard metric for evaluating AI systems, but it offers limited insight into how models arrive at their solutions. In this work, we introduce a benchmark based on brainteasers written in long narrative form to probe more deeply into the types of reasoning strategies that models use. Brainteasers are well-suited for this goal because they can be solved with multiple approaches, such as a few-step solution that uses a creative insight or a longer solution that uses more brute force. We investigate large language models (LLMs) across multiple layers of reasoning, focusing not only on correctness but also on the quality and creativity of their solutions. We investigate many aspects of the reasoning process: (1) semantic parsing of the brainteasers into precise mathematical competition style formats; (2) generating solutions from these mathematical forms; (3) self-correcting solutions based on gold solutions; (4) producing step-by-step sketches of solutions; and (5) making use of hints. We find that LLMs are in many cases able to find creative, insightful solutions to brainteasers, suggesting that they capture some of the capacities needed to solve novel problems in creative ways. Nonetheless, there also remain situations where they rely on brute force despite the availability of more efficient, creative solutions, highlighting a potential direction for improvement in the reasoning abilities of LLMs.
Abstract（参考訳）: 精度は依然としてAIシステムを評価する標準的な指標だが、モデルがどのようにソリューションに到達するかについて、限られた洞察を提供する。本研究では,モデルが使用する推論戦略のタイプをより深く探求するために,長い物語形式で書かれたブレインテザに基づくベンチマークを導入する。ブレインテザは、創造的な洞察を使う数ステップのソリューションや、より残酷な力を使う長いソリューションなど、複数のアプローチで解決できるため、この目標に適しています。複数の推論層にわたる大規模言語モデル(LLM)について検討し、正確性だけでなく、ソリューションの品質や創造性にも焦点をあてる。推論過程の多くの側面について検討する:(1)脳触手から正確な数学的競争形式への意味解析、(2)これらの数学的形式からの解の生成、(3)金解に基づく自己補正ソリューション、(4)解のステップバイステップスケッチの作成、(5)ヒントの活用。 LLMは、多くの場合、創造的で洞察力に富んだ、ブレインティーザーのソリューションを見つけることができ、創造的な方法で新しい問題を解決するのに必要な能力のいくつかを捉えていることを示唆している。それにもかかわらず、より効率的で創造的なソリューションが利用可能であるにもかかわらず、彼らが残酷な力に頼る状況も残っており、LLMの推論能力を改善するための潜在的方向性を強調している。

論文の概要: Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models

関連論文リスト