Fugu-MT 論文翻訳(概要): Baba is LLM: Reasoning in a Game with Dynamic Rules

論文の概要: Baba is LLM: Reasoning in a Game with Dynamic Rules

arxiv url: http://arxiv.org/abs/2506.19095v1
Date: Mon, 23 Jun 2025 20:16:28 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-25 19:48:23.374054
Title: Baba is LLM: Reasoning in a Game with Dynamic Rules
Title（参考訳）: Baba is LLM:動的ルールを持つゲームにおける推論
Authors: Fien van Wetten, Aske Plaat, Max van Duijn,
Abstract要約: 大規模言語モデル(LLM)は、言語タスクではうまく機能することが知られているが、推論タスクでは苦労している。本稿では,LLMが2DパズルゲームBaba Is Youをプレイする能力について考察する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are known to perform well on language tasks, but struggle with reasoning tasks. This paper explores the ability of LLMs to play the 2D puzzle game Baba is You, in which players manipulate rules by rearranging text blocks that define object properties. Given that this rule-manipulation relies on language abilities and reasoning, it is a compelling challenge for LLMs. Six LLMs are evaluated using different prompt types, including (1) simple, (2) rule-extended and (3) action-extended prompts. In addition, two models (Mistral, OLMo) are finetuned using textual and structural data from the game. Results show that while larger models (particularly GPT-4o) perform better in reasoning and puzzle solving, smaller unadapted models struggle to recognize game mechanics or apply rule changes. Finetuning improves the ability to analyze the game levels, but does not significantly improve solution formulation. We conclude that even for state-of-the-art and finetuned LLMs, reasoning about dynamic rule changes is difficult (specifically, understanding the use-mention distinction). The results provide insights into the applicability of LLMs to complex problem-solving tasks and highlight the suitability of games with dynamically changing rules for testing reasoning and reflection by LLMs.
Abstract（参考訳）: 大規模言語モデル(LLM)は、言語タスクではうまく機能することが知られているが、推論タスクでは苦労している。本稿では,LLMが2DパズルゲームBaba is Youをプレイする能力について考察する。このルール操作は言語能力と推論に依存しているため、LLMにとって魅力的な課題である。 6つのLSMを,(1)単純,(2)規則拡張,(3)行動拡張のプロンプトなど,異なるタイプのプロンプトを用いて評価した。さらに、2つのモデル(Mistral, OLMo)をゲームからテキストデータと構造データを用いて微調整する。その結果、より大きなモデル(特にGPT-4o)は推論やパズルの解法において優れているが、より小さな未適応モデルはゲーム力学を認識したりルールの変更を適用したりするのに苦労していることがわかった。ファインタニングはゲームレベルを解析する能力を向上させるが、解の定式化を著しく改善するわけではない。我々は、最先端かつ微調整されたLLMであっても、動的規則変化の推論は困難である(具体的には、使用の区別を理解する)と結論付けている。その結果, LLM の複雑な問題解決タスクへの適用性に関する知見が得られ, LLM による推論とリフレクションのテストを行うための動的に変化するルールを持つゲームの適合性を強調した。

論文の概要: Baba is LLM: Reasoning in a Game with Dynamic Rules

関連論文リスト