Fugu-MT 論文翻訳(概要): How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem

論文の概要: How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem

arxiv url: http://arxiv.org/abs/2605.06882v1
Date: Thu, 07 May 2026 19:31:43 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 19:43:38.580536
Title: How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem
Title（参考訳）: 最も単純な長鎖推論課題におけるLLMの性能について--等価クラス問題に関する実証的研究
Authors: Chun Zheng, Lianlong Wu, Bingqian Li, Lvting Liu, Yi Zhou,
Abstract要約: 我々は,最も単純な長鎖推論タスクにおいて,大規模言語モデルの性能を評価する。非推論モデルはECPに失敗するが、推論モデルは大幅に改善されているが、この問題を完全に解くのに苦戦している。
参考スコア（独自算出の注目度）: 5.006638589584725
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Large Language Models (LLMs) have achieved great improvements in recent years. Nevertheless, it still remains unclear how good LLMs are for reasoning tasks, especially for long-chain ones. In this paper, we evaluate LLMs' performance on the simplest yet long-chain reasoning task, namely the Equivalence Class Problem (ECP), i.e., determining whether two variables are equal given a set of randomly generated equivalence relations. We consider both reasoning and non-reasoning representative LLMs over a large variety of problem instances, ranging over different numbers of variables, connectivity probabilities, prompts, and other factors. The experimental results show that non-reasoning LLMs fail ECP, while reasoning models are significantly better but still struggle to completely solve this problem. Interestingly, considering various connectivity probabilities with a fixed number of variables, we observe that, for non-reasoning models, the hardest problem instances coincide with the phase transition point of ln n/(n-1), suggesting the chaos of the problem; in contrast, for reasoning models, the hardest ones coincide with the biggest diameter, suggesting the reasoning difficulty of the problem.
Abstract（参考訳）: 大規模言語モデル(LLM)は近年大きな改善を遂げています。それでも、LLMがタスク、特にロングチェーンのタスクに対してどの程度優れているかは、いまだに不明である。本稿では,最も単純な長鎖推論タスク,すなわち等価クラス問題(ECP)において,LLMの性能を評価する。我々は、様々な変数、接続確率、プロンプト、その他の要因にまたがる様々な問題インスタンスに対して、推論と非推論の両方を考察する。実験の結果,非共振型LCMはECPに失敗する一方で,推論モデルの方が優れているが,この問題を完全に解くのに苦慮していることがわかった。興味深いことに、一定数の変数を持つ様々な接続確率を考慮すると、非推論モデルでは、最も難しい問題インスタンスは、問題のカオスを示唆するln n/(n-1)の相転移点と一致し、対照的に、最も難しいものは最大の直径と一致し、問題の理由の難しさが示唆される。

論文の概要: How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem

関連論文リスト