Fugu-MT 論文翻訳(概要): A Causal Perspective on Measuring, Explaining and Mitigating Smells in \llm-Generated Code

論文の概要: A Causal Perspective on Measuring, Explaining and Mitigating Smells in \llm-Generated Code

arxiv url: http://arxiv.org/abs/2511.15817v1
Date: Wed, 19 Nov 2025 19:18:28 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-21 17:08:52.344636
Title: A Causal Perspective on Measuring, Explaining and Mitigating Smells in \llm-Generated Code
Title（参考訳）: \llm生成符号におけるスメルの測定・説明・緩和に関する因果的視点
Authors: Alejandro Velasco, Daniel Rodriguez-Cardenas, Dipin Khati, David N. Palacio, Luftar Rahman Alif, Denys Poshyvanyk,
Abstract要約: Propensity Smelly Score (PSC) は、特定の臭いの種類を生成する確率を推定する計量である。我々は、生成戦略、モデルサイズ、モデルアーキテクチャ、および生成したコードの構造特性をいかに形成するかを識別する。 PSCは、開発者がモデルの振る舞いを解釈し、コード品質を評価するのに役立つ。
参考スコア（独自算出の注目度）: 49.09545217453401
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent advances in large language models (LLMs) have accelerated their adoption in software engineering contexts. However, concerns persist about the structural quality of the code they produce. In particular, LLMs often replicate poor coding practices, introducing code smells (i.e., patterns that hinder readability, maintainability, or design integrity). Although prior research has examined the detection or repair of smells, we still lack a clear understanding of how and when these issues emerge in generated code. This paper addresses this gap by systematically measuring, explaining and mitigating smell propensity in LLM-generated code. We build on the Propensity Smelly Score (PSC), a probabilistic metric that estimates the likelihood of generating particular smell types, and establish its robustness as a signal of structural quality. Using PSC as an instrument for causal analysis, we identify how generation strategy, model size, model architecture and prompt formulation shape the structural properties of generated code. Our findings show that prompt design and architectural choices play a decisive role in smell propensity and motivate practical mitigation strategies that reduce its occurrence. A user study further demonstrates that PSC helps developers interpret model behavior and assess code quality, providing evidence that smell propensity signals can support human judgement. Taken together, our work lays the groundwork for integrating quality-aware assessments into the evaluation and deployment of LLMs for code.
Abstract（参考訳）: 大規模言語モデル(LLM)の最近の進歩は、ソフトウェア工学の文脈での採用を加速している。しかしながら、それらが生成するコードの構造的品質に関する懸念が続いている。特に、LLMはコードの臭い(可読性、保守性、設計の整合性を阻害するパターン)を導入して、悪いコーディングプラクティスを再現することが多い。以前の研究では、臭いの検出や修復が検討されていたが、これらの問題が生成されたコードでどのように、いつ発生するかを明確に理解できていない。本稿では,LLM生成符号の匂いの再現性をシステマティックに測定し,説明し,緩和することにより,このギャップを解消する。我々は,特定の匂いを発生させる確率を推定する確率的尺度であるPSC(Propensity Smelly Score)を構築し,その頑健さを構造的品質の信号として確立する。因果解析の手段としてPSCを用いると、生成戦略、モデルサイズ、モデルアーキテクチャ、生成したコードの構造的特性をいかに形成するかが分かる。以上の結果から, 迅速な設計選択と建築選択が, 匂いの再現性において決定的な役割を担い, 発生を減少させる実践的緩和戦略を動機付けていることが明らかとなった。ユーザスタディでは、PSCが開発者がモデルの振る舞いを解釈し、コード品質を評価するのに役立つことを示し、匂いの再現性信号が人間の判断を裏付ける証拠を提供する。まとめると、コードのためのLCMの評価とデプロイに品質に配慮した評価を統合するための基礎となる作業である。

論文の概要: A Causal Perspective on Measuring, Explaining and Mitigating Smells in \llm-Generated Code

関連論文リスト