Fugu-MT 論文翻訳(概要): Absurd World: A Simple Yet Powerful Method to Absurdify the Real-world for Probing LLM Reasoning Capabilities

論文の概要: Absurd World: A Simple Yet Powerful Method to Absurdify the Real-world for Probing LLM Reasoning Capabilities

arxiv url: http://arxiv.org/abs/2605.09678v1
Date: Sun, 10 May 2026 17:55:38 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.367479
Title: Absurd World: A Simple Yet Powerful Method to Absurdify the Real-world for Probing LLM Reasoning Capabilities
Title（参考訳）: LLM推論能力を証明するための、シンプルでパワフルな方法
Authors: Ryan Albright, Golam Md Muktadir, Zarif Ikram, S M Jubaer, Mehrab Hossain, Dianbo Liu,
Abstract要約: 本稿では,大規模な言語モデルを変更現実主義に対してテストするためのベンチマークフレームワークであるAbsurd Worldを提案する。単純で先進的なプロンプト技術を持つモデルの大規模なコレクションを評価し、LLMが論理的に考える能力を決定するのに有効なツールであることを証明している。
参考スコア（独自算出の注目度）: 3.706540783851095
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While extremely powerful and versatile at various tasks, the thinking capabilities of large language models (LLMs) are often put under scrutiny as they sometimes fail to solve problems that humans can systematically solve. However, recent literature focuses on breaking LLM reasoning with increasingly complex problems, and whether an LLM is robust in simple logical reasoning remains underexplored. This paper proposes Absurd World, a benchmarking framework, to test LLMs against altered realism, where scenarios are logically coherent, and humans can easily solve the tasks. Absurd World breaks a real-world model into symbols, actions, sequences, and events, which are automatically altered to create absurd worlds where the logic to solve the tasks remains the same. It evaluates a large collection of models with simple and advanced prompting techniques, and proves that it is an effective tool to determine LLMs' ability to think logically, ignoring the patterns learned from the real world. One can use this framework to extensively test an LLM against a real-world problem to verify whether the LLM's reasoning capability is robust against variations of the task.
Abstract（参考訳）: 様々なタスクにおいて非常に強力で汎用性があるが、大きな言語モデル(LLM)の思考能力は、時に人間が体系的に解決できる問題の解決に失敗するため、精査されることが多い。しかし、近年の文献では、LLM推論を複雑化する問題と、LLMが単純な論理的推論において堅牢であるかどうかに焦点が当てられている。本稿では、シナリオが論理的に一貫性があり、人間が容易にタスクを解くことができるような、変化したリアリズムに対してLCMをテストするためのベンチマークフレームワークであるAbsurd Worldを提案する。 Absurd Worldは現実世界のモデルをシンボル、アクション、シーケンス、イベントに分割する。単純で先進的なプロンプト技術を持つモデルの大規模なコレクションを評価し、LLMが論理的に考える能力を決定するのに有効なツールであり、現実世界から学んだパターンを無視していることを証明している。このフレームワークは、実世界の問題に対してLLMを広範囲にテストし、LLMの推論能力がタスクのバリエーションに対して堅牢かどうかを検証するのに使うことができる。

論文の概要: Absurd World: A Simple Yet Powerful Method to Absurdify the Real-world for Probing LLM Reasoning Capabilities

関連論文リスト