Fugu-MT 論文翻訳(概要): Measuring how changes in code readability attributes affect code quality evaluation by Large Language Models

論文の概要: Measuring how changes in code readability attributes affect code quality evaluation by Large Language Models

arxiv url: http://arxiv.org/abs/2507.05289v2
Date: Wed, 09 Jul 2025 18:24:41 GMT
ステータス: 翻訳完了
システム内更新日: 2025-07-11 12:24:00.071452
Title: Measuring how changes in code readability attributes affect code quality evaluation by Large Language Models
Title（参考訳）: 大規模言語モデルによるコードの可読性特性の変化がコード品質評価に与える影響の測定
Authors: Igor Regis da Silva Simoes, Elaine Venson,
Abstract要約: コード可読性はコード品質の主要な側面の1つであり、識別子名、コメント、コード構造、標準への準拠といった様々な特性に影響を受けています。本稿では,Large Language Models (LLMs) を用いて,その可読性に関連するコード品質特性を標準化され再現可能で一貫した方法で評価する。
参考スコア（独自算出の注目度）: 2.3204178451683264
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Code readability is one of the main aspects of code quality, influenced by various properties like identifier names, comments, code structure, and adherence to standards. However, measuring this attribute poses challenges in both industry and academia. While static analysis tools assess attributes such as code smells and comment percentage, code reviews introduce an element of subjectivity. This paper explores using Large Language Models (LLMs) to evaluate code quality attributes related to its readability in a standardized, reproducible, and consistent manner. We conducted a quasi-experiment study to measure the effects of code changes on Large Language Model (LLM)s interpretation regarding its readability quality attribute. Nine LLMs were tested, undergoing three interventions: removing comments, replacing identifier names with obscure names, and refactoring to remove code smells. Each intervention involved 10 batch analyses per LLM, collecting data on response variability. We compared the results with a known reference model and tool. The results showed that all LLMs were sensitive to the interventions, with agreement with the reference classifier being high for the original and refactored code scenarios. The LLMs demonstrated a strong semantic sensitivity that the reference model did not fully capture. A thematic analysis of the LLMs reasoning confirmed their evaluations directly reflected the nature of each intervention. The models also exhibited response variability, with 9.37% to 14.58% of executions showing a standard deviation greater than zero, indicating response oscillation, though this did not always compromise the statistical significance of the results. LLMs demonstrated potential for evaluating semantic quality aspects, such as coherence between identifier names, comments, and documentation with code purpose.
Abstract（参考訳）: コード可読性はコード品質の主要な側面の1つであり、識別子名、コメント、コード構造、標準への準拠といった様々な特性に影響を受けています。しかし、この属性を測定することは、業界と学界の両方に課題をもたらす。静的解析ツールは、コードの臭いやコメントパーセンテージなどの属性を評価する一方で、コードレビューは主観性の要素を導入する。本稿では,Large Language Models (LLMs) を用いて,その可読性に関連するコード品質特性を標準化され再現可能で一貫した方法で評価する。我々は,コード変更がLarge Language Model(LLM)の可読性の品質特性に対する解釈に与える影響を評価するための準実験を行った。 9つのLSMがテストされ、コメントの削除、識別子名を不明瞭な名前に置き換え、コードの臭いを取り除くリファクタリングという3つの介入が行われた。各介入には、LSM当たり10バッチ分析が含まれ、応答の変動に関するデータを収集した。結果と既知の参照モデルとツールを比較した。その結果、全てのLLMは介入に敏感であり、参照分類器との一致は、元のコードシナリオとリファクタリングされたコードシナリオに高いことがわかった。 LLMは、参照モデルが完全に捉えられていないような強い意味感受性を示した。 LLMの理論的解析により,それぞれの介入の性質を直接反映した評価が得られた。モデルはまた応答の変動を示し、9.37%から14.58%は標準偏差が0より大きいことを示し、応答の発振を示すが、これは必ずしも結果の統計的重要性を損なうことはなかった。 LLMは、識別子名、コメント、コード目的のドキュメントの一貫性など、セマンティックな品質面を評価する可能性を実証した。

論文の概要: Measuring how changes in code readability attributes affect code quality evaluation by Large Language Models

関連論文リスト