Fugu-MT 論文翻訳(概要): A Study on Thinking Patterns of Large Reasoning Models in Code Generation

論文の概要: A Study on Thinking Patterns of Large Reasoning Models in Code Generation

arxiv url: http://arxiv.org/abs/2509.13758v1
Date: Wed, 17 Sep 2025 07:13:12 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-18 18:41:50.753644
Title: A Study on Thinking Patterns of Large Reasoning Models in Code Generation
Title（参考訳）: コード生成における大規模推論モデルの思考パターンに関する研究
Authors: Kevin Halim, Sin G. Teo, Ruitao Feng, Zhenpeng Chen, Yang Gu, Chong Wang, Yang Liu,
Abstract要約: 大規模言語モデル(LLM)は、コード生成のようなソフトウェア工学のタスクに利用される。本稿では,LRMのコード生成時の推論動作の調査と解明を目的とした総合的研究について述べる。我々は, 4段階にわたる15の推論行動を含む, LRM推論行動の分類法を導出した。
参考スコア（独自算出の注目度）: 14.138043269602074
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Currently, many large language models (LLMs) are utilized for software engineering tasks such as code generation. The emergence of more advanced models known as large reasoning models (LRMs), such as OpenAI's o3, DeepSeek R1, and Qwen3. They have demonstrated the capability of performing multi-step reasoning. Despite the advancement in LRMs, little attention has been paid to systematically analyzing the reasoning patterns these models exhibit and how such patterns influence the generated code. This paper presents a comprehensive study aimed at investigating and uncovering the reasoning behavior of LRMs during code generation. We prompted several state-of-the-art LRMs of varying sizes with code generation tasks and applied open coding to manually annotate the reasoning traces. From this analysis, we derive a taxonomy of LRM reasoning behaviors, encompassing 15 reasoning actions across four phases. Our empirical study based on the taxonomy reveals a series of findings. First, we identify common reasoning patterns, showing that LRMs generally follow a human-like coding workflow, with more complex tasks eliciting additional actions such as scaffolding, flaw detection, and style checks. Second, we compare reasoning across models, finding that Qwen3 exhibits iterative reasoning while DeepSeek-R1-7B follows a more linear, waterfall-like approach. Third, we analyze the relationship between reasoning and code correctness, showing that actions such as unit test creation and scaffold generation strongly support functional outcomes, with LRMs adapting strategies based on task context. Finally, we evaluate lightweight prompting strategies informed by these findings, demonstrating the potential of context- and reasoning-oriented prompts to improve LRM-generated code. Our results offer insights and practical implications for advancing automatic code generation.
Abstract（参考訳）: 現在、多くの大規模言語モデル(LLM)がコード生成のようなソフトウェア工学のタスクに利用されている。 OpenAIのo3、DeepSeek R1、Qwen3など、より大きな推論モデル(LRM)として知られるより高度なモデルが出現した。彼らは多段階推論を行う能力を示した。 LRMの進歩にもかかわらず、これらのモデルが示す推論パターンと、そのようなパターンが生成されたコードにどのように影響するかを体系的に分析するには、ほとんど注意が払われていない。本稿では,LRMのコード生成時の推論動作の調査と解明を目的とした総合的研究について述べる。コード生成タスクとオープンコーディングを適用して、推論トレースを手動でアノテートしました。この分析から, 4段階にわたる15の推論行動を含む, LRM推論行動の分類法を導出した。分類学に基づく実証研究により, 一連の知見が得られた。まず、一般的な推論パターンを特定し、LEMは一般的に人間のようなコーディングワークフローに従っており、足場、欠陥検出、スタイルチェックといった追加のアクションを誘発するより複雑なタスクであることを示す。第2に、モデル間での推論を比較し、Qwen3が反復的推論を示すのに対して、DeepSeek-R1-7Bはより線形でウォーターフォール的なアプローチであることを示す。第三に、推論とコード正当性の関係を分析し、単体テスト生成や足場生成といったアクションが機能的成果を強く支えていることを示す。最後に、これらの結果から得られた軽量なプロンプト戦略を評価し、LRM生成コードを改善するためのコンテキスト指向および推論指向のプロンプトの可能性を示す。結果から,自動コード生成の進歩に対する洞察と実践的示唆が得られた。

論文の概要: A Study on Thinking Patterns of Large Reasoning Models in Code Generation

関連論文リスト