Fugu-MT 論文翻訳(概要): \texttt{ReMind}: Understanding Deductive Code Reasoning in LLMs

論文の概要: \texttt{ReMind}: Understanding Deductive Code Reasoning in LLMs

arxiv url: http://arxiv.org/abs/2511.00488v1
Date: Sat, 01 Nov 2025 10:42:40 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-05 16:37:26.803224
Title: \texttt{ReMind}: Understanding Deductive Code Reasoning in LLMs
Title（参考訳）: \texttt{ReMind}: LLMにおけるデダクティブコード推論を理解する
Authors: Jun Gao, Yun Peng, Xiaoxue Ren,
Abstract要約: 大規模言語モデル(LLM)は、コード関連のタスクにおいて顕著な進歩を遂げた。彼らはまだ、プログラム実行プロセスについて推論する能力である暗黙のコード推論に苦戦しています。 textttReMindは,textttMutator, textttExecutor, textttInspectorで構成されるマルチエージェントフレームワークである。
参考スコア（独自算出の注目度）: 6.918479033945452
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have achieved remarkable progress in code-related tasks. Despite their advancement, empirical evidence reveals that they still struggle with \emph{deductive code reasoning}, the ability to reason about the program execution process. While prior studies have recognized this limitation, the underlying causes remain largely underexplored. In this paper, we begin by presenting a comprehensive empirical study that reveals three key challenges undermining deductive code reasoning: (1) an intrinsic gap between generation and reasoning abilities, (2) a consistent bias towards code sources, and (3) weak zero-shot generalization on complex benchmarks. In light of these challenges, we propose \texttt{ReMind}, a multi-agent framework composed of \texttt{Mutator}, \texttt{Executor}, and \texttt{Inspector}. The \texttt{Mutator} generates code variants to mitigate bias towards code sources, the \texttt{Executor} traces variable states step-by-step to expose inconsistency, and the \texttt{Inspector} identifies problematic reasoning steps and provides control-flow refinement to bridge the intrinsic reasoning gap. Through their coordinated collaboration, \texttt{ReMind} systematically identifies and refines reasoning flaws, achieving outstanding performance and enabling robust zero-shot generalization. Extensive experiments on two benchmarks with five LLMs demonstrate the superior advantages of \texttt{ReMind} compared to baseline approaches in deductive code reasoning.
Abstract（参考訳）: 大規模言語モデル(LLM)は、コード関連のタスクにおいて顕著な進歩を遂げた。その進歩にもかかわらず、実証的な証拠は、プログラムの実行プロセスについて推論する能力である 'emph{deductive code reasoning} とまだ苦労していることを示している。以前の研究では、この制限は認識されていたが、根本原因は未解明のままである。本稿では,(1)生成能力と推論能力との本質的なギャップ,(2)コードソースに対する一貫したバイアス,(3)複雑なベンチマークにおけるゼロショット一般化の弱さ,の3つの重要な課題を明らかにする。これらの課題を考慮し, \texttt{ReMind}, \texttt{Mutator}, \texttt{Executor}, \texttt{Inspector}からなるマルチエージェントフレームワークを提案する。 \textt{Mutator} は、コードソースに対するバイアスを軽減するためのコード変種を生成し、 \textt{Executor} は、不変状態を段階的にトレースして不整合を露呈し、 \textt{Inspector} は問題のある推論ステップを特定し、本質的な推論ギャップをブリッジするための制御フローの洗練を提供する。コーディネートされたコラボレーションを通じて、 \texttt{ReMind} は系統的に推論の欠陥を特定し、洗練し、優れた性能を達成し、堅牢なゼロショットの一般化を可能にする。 5つの LLM を持つ2つのベンチマークの大規模な実験は、導出的コード推論におけるベースラインアプローチと比較して、 \texttt{ReMind} の優れた優位性を示している。

論文の概要: \texttt{ReMind}: Understanding Deductive Code Reasoning in LLMs

関連論文リスト