Fugu-MT 論文翻訳(概要): Exploring the Potential and Limitations of Large Language Models for Novice Program Fault Localization

論文の概要: Exploring the Potential and Limitations of Large Language Models for Novice Program Fault Localization

arxiv url: http://arxiv.org/abs/2512.03421v1
Date: Wed, 03 Dec 2025 03:55:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-04 20:02:55.111008
Title: Exploring the Potential and Limitations of Large Language Models for Novice Program Fault Localization
Title（参考訳）: 初心者プログラム故障特定のための大規模言語モデルの可能性と限界を探る
Authors: Hexiang Xu, Hengyuan Liu, Yonghao Wu, Xiaolan Kang, Xiang Chen, Yong Liu,
Abstract要約: 初心者プログラマは、限られた経験とプログラミングの構文とロジックの理解のために、フォールトローカライゼーションの課題に直面することが多い。大きな言語モデル(LLM)は、プログラムの構文やセマンティクスを理解する能力を活用することで、これらの制限を克服することを約束している。本研究では、Codeflaws、Condefects、BugTデータセットを用いて、6つのクローズドソースと7つのオープンソースLCMを評価する。
参考スコア（独自算出の注目度）: 13.571471290271122
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Novice programmers often face challenges in fault localization due to their limited experience and understanding of programming syntax and logic. Traditional methods like Spectrum-Based Fault Localization (SBFL) and Mutation-Based Fault Localization (MBFL) help identify faults but often lack the ability to understand code context, making them less effective for beginners. In recent years, Large Language Models (LLMs) have shown promise in overcoming these limitations by utilizing their ability to understand program syntax and semantics. LLM-based fault localization provides more accurate and context-aware results than traditional techniques. This study evaluates six closed-source and seven open-source LLMs using the Codeflaws, Condefects, and BugT datasets, with BugT being a newly constructed dataset specifically designed to mitigate data leakage concerns. Advanced models with reasoning capabilities, such as OpenAI o3 and DeepSeekR1, achieve superior accuracy with minimal reliance on prompt engineering. In contrast, models without reasoning capabilities, like GPT-4, require carefully designed prompts to maintain performance. While LLMs perform well in simple fault localization, their accuracy decreases as problem difficulty increases, though top models maintain robust performance in the BugT dataset. Over-reasoning is another challenge, where some models generate excessive explanations that hinder fault localization clarity. Additionally, the computational cost of deploying LLMs remains a significant barrier for real-time debugging. LLM's explanations demonstrate significant value for novice programmer assistance, with one-year experience participants consistently rating them highly. Our findings demonstrate the potential of LLMs to improve debugging efficiency while stressing the need for further refinement in their reasoning and computational efficiency for practical adoption.
Abstract（参考訳）: 初心者プログラマは、限られた経験とプログラミング構文とロジックの理解のために、フォールトローカライゼーションの課題に直面することが多い。スペクトルベースのフォールトローカライゼーション(SBFL)やミューテーションベースのフォールトローカライゼーション(MBFL)といった従来の手法は、障害を特定するのに役立ちますが、多くの場合、コードコンテキストを理解する能力が欠如しているため、初心者にとっては効果が低いのです。近年、LLM(Large Language Models)は、プログラムの構文やセマンティクスを理解する能力を活用して、これらの制限を克服することを約束している。 LLMベースのフォールトローカライゼーションは、従来の手法よりも正確でコンテキストに合った結果を提供する。この研究では、Codeflaws、Condefects、BugTデータセットを使用して、6つのクローズドソースと7つのオープンソースLCMを評価し、BugTはデータ漏洩の懸念を軽減するために設計された、新たに構築されたデータセットである。 OpenAI o3やDeepSeekR1のような推論機能を備えた高度なモデルは、プロンプトエンジニアリングへの依存を最小限に抑えて、優れた精度を達成する。対照的に、GPT-4のような推論能力のないモデルは、性能を維持するために慎重に設計されたプロンプトを必要とする。 LLMは単純なフォールトローカライゼーションでは良好に機能するが、その精度はBugTデータセットでは堅牢な性能を維持しつつも、問題の難しさが増大するにつれて低下する。過剰推論(Over-reasoning)は別の課題であり、いくつかのモデルでは、障害の局所化を阻害する過剰な説明を生成する。加えて、LLMをデプロイする際の計算コストは、リアルタイムデバッグにとって重要な障壁である。 LLMの説明は初心者のプログラマー支援にとって重要な価値を示し、1年の経験を持つ参加者はそれらを常に高く評価している。本研究は,LLMのデバッグ効率向上に寄与する可能性を示すとともに,その推理や計算効率の向上の必要性を強調しつつも,デバッグ効率の向上を図っている。

論文の概要: Exploring the Potential and Limitations of Large Language Models for Novice Program Fault Localization

関連論文リスト