Fugu-MT 論文翻訳(概要): Coding in a Bubble? Evaluating LLMs in Resolving Context Adaptation Bugs During Code Adaptation

論文の概要: Coding in a Bubble? Evaluating LLMs in Resolving Context Adaptation Bugs During Code Adaptation

arxiv url: http://arxiv.org/abs/2601.06497v1
Date: Sat, 10 Jan 2026 09:14:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-13 19:08:00.846026
Title: Coding in a Bubble? Evaluating LLMs in Resolving Context Adaptation Bugs During Code Adaptation
Title（参考訳）: バブルにおける符号化 : コード適応時の文脈適応バグの解消におけるLLMの評価
Authors: Tanghaoran Zhang, Xinjun Mao, Shangwen Wang, Yuxin Zhao, Yao Lu, Zezhou Tang, Wenyu Xu, Longfei Sun, Changrong Xie, Kang Yang, Yue Yu,
Abstract要約: 重要な課題は、コンテキスト適応バグ(CtxBugs)を解決することです。分離されたバグとは異なり、CtxBugsは局所的な修正によって解決できず、コンテキスト間の推論を必要とする。 LLM(Large Language Models)は、コード関連のタスクを自動化する大きな可能性を示しているが、CtxBugsを解決する能力は、コード適応における実践的使用に対する重要な、未調査の障害である。
参考スコア（独自算出の注目度）: 16.969255848886693
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Code adaptation is a fundamental but challenging task in software development, requiring developers to modify existing code for new contexts. A key challenge is to resolve Context Adaptation Bugs (CtxBugs), which occurs when code correct in its original context violates constraints in the target environment. Unlike isolated bugs, CtxBugs cannot be resolved through local fixes and require cross-context reasoning to identify semantic mismatches. Overlooking them may lead to critical failures in adaptation. Although Large Language Models (LLMs) show great potential in automating code-related tasks, their ability to resolve CtxBugs remains a significant and unexplored obstacle to their practical use in code adaptation. To bridge this gap, we propose CtxBugGen, a novel framework for generating CtxBugs to evaluate LLMs. Its core idea is to leverage LLMs' tendency to generate plausible but context-free code when contextual constraints are absent. The framework generates CtxBugs through a four-step process to ensure their relevance and validity: (1) Adaptation Task Selection, (2) Task-specific Perturbation,(3) LLM-based Variant Generation and (4) CtxBugs Identification. Based on the benchmark constructed by CtxBugGen, we conduct an empirical study with four state-of-the-art LLMs. Our results reveal their unsatisfactory performance in CtxBug resolution. The best performing LLM, Kimi-K2, achieves 55.93% on Pass@1 and resolves just 52.47% of CtxBugs. The presence of CtxBugs degrades LLMs' adaptation performance by up to 30%. Failure analysis indicates that LLMs often overlook CtxBugs and replicate them in their outputs. Our study highlights a critical weakness in LLMs' cross-context reasoning and emphasize the need for new methods to enhance their context awareness for reliable code adaptation.
Abstract（参考訳）: コード適応はソフトウェア開発の基本的だが難しい課題であり、開発者は新しいコンテキストのために既存のコードを変更する必要がある。重要な課題は、コンテキスト適応バグ(CtxBugs)を解決することだ。分離されたバグとは異なり、CtxBugsはローカル修正によって解決できず、セマンティックミスマッチを特定するためにコンテキスト横断推論を必要とする。それらを見渡すと、適応において重大な失敗につながる可能性がある。 LLM(Large Language Models)は、コード関連のタスクを自動化する大きな可能性を示しているが、CtxBugsを解決する能力は、コード適応における実践的使用に対する重要な、未調査の障害である。このギャップを埋めるために、我々はCtxBugsを生成する新しいフレームワークであるCtxBugGenを提案し、LLMを評価する。その中核となる考え方は、LLMがコンテキスト制約が欠如している場合に、可塑性だが文脈自由なコードを生成する傾向を活用することである。 1)適応タスク選択,(2)タスク固有の摂動,(3)LLMに基づく可変生成,(4)CtxBugs識別。 CtxBugGenによって構築されたベンチマークに基づいて、4つの最先端LCMを用いて実験を行った。以上の結果から,CtxBugの解像度における不満足な性能が明らかになった。最高のLLMであるKim-K2はPass@1で55.93%を獲得し、CtxBugsの52.47%しか解決していない。 CtxBugsの存在はLLMの適応性能を最大30%低下させる。フェール分析は、LCMがしばしばCtxBugsを見落とし、出力でそれを複製することを示している。本研究は、LLMのクロスコンテキスト推論における重大な弱点を強調し、信頼性のあるコード適応のためのコンテキスト認識を強化する新しい方法の必要性を強調した。

論文の概要: Coding in a Bubble? Evaluating LLMs in Resolving Context Adaptation Bugs During Code Adaptation

関連論文リスト