Fugu-MT 論文翻訳(概要): Clean Code, Better Models: Enhancing LLM Performance with Smell-Cleaned Dataset

論文の概要: Clean Code, Better Models: Enhancing LLM Performance with Smell-Cleaned Dataset

arxiv url: http://arxiv.org/abs/2508.11958v1
Date: Sat, 16 Aug 2025 07:40:58 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-19 14:49:10.484537
Title: Clean Code, Better Models: Enhancing LLM Performance with Smell-Cleaned Dataset
Title（参考訳）: クリーンなコードとモデルの改善: Smell-Cleaned DatasetによるLCMパフォーマンス向上
Authors: Zhipeng Xue, Xiaoting Zhang, Zhipeng Gao, Xing Hu, Shan Gao, Xin Xia, Shanping Li,
Abstract要約: この研究は、コード臭いの観点からデータセットの品質を評価し改善する最初の体系的な研究である。コード臭を自動的に除去する,LCMベースのコード臭除去ツールSmellCCを提案する。
参考スコア（独自算出の注目度）: 13.23492570818459
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Large Language Models (LLMs) have demonstrated great potential in code-related tasks. However, most research focuses on improving the output quality of LLMs (e.g., correctness), and less attention has been paid to the LLM input (e.g., the training code quality). Given that code smells are widely existed in practice and can negatively impact software maintainability and readability, this study takes the first systematic research to assess and improve dataset quality in terms of code smells. In this work, we first conduct a preliminary study to explore the presence of code smells in a popular benchmark dataset (i.e., CodeSearchNet-Python}) and evaluate the output of several popular LLMs (i.e., DeepSeek-Coder, CodeLlama, and MagiCoder), revealing that code smell issues extensively exist in LLM's input (e.g., benchmark dataset) and output (e.g., generated code). We then conduct our systematic research by taking three main steps: Firstly, we propose an LLM-based code smell cleaning tool, named SmellCC, which automatically refactors and removes code smells. To evaluate the correctness of the code refactoring, we construct a test set of 50 repositories sourced from the CodeSearchNet-Python benchmark for functional testing. Then we apply our curated smell-cleaned dataset to fine-tune two LLMs (i.e., DeepSeek-V2 and Qwen-Coder) to explore their potential for generating high-quality code. Thirdly, we investigate the impact of code smells on two downstream tasks: code completion and code search. Lastly, we derive several actionable implications for software engineering researchers and industry practitioners from our findings.
Abstract（参考訳）: LLM(Large Language Models)は、コード関連のタスクにおいて大きな可能性を証明している。しかし、ほとんどの研究はLLMの出力品質の改善(例えば、正確性)に焦点を当てており、LLMの入力(例えば、トレーニングコードの品質)には注意が払われていない。コードの臭いは、実際に広く存在し、ソフトウェアの保守性と可読性に悪影響を及ぼす可能性があることを考慮し、コード臭いの観点からデータセットの品質を評価し改善する最初の体系的な研究を行う。本研究では、まず、人気のあるベンチマークデータセット(例えば、CodeSearchNet-Python})におけるコードの臭いの有無を調査し、いくつかの人気のあるLCM(例えば、DeepSeek-Coder、CodeLlama、MagiCoder)の出力を評価し、LCMの入力(例えば、ベンチマークデータセット)と出力(例えば、生成されたコード)にコードの臭いが広範囲に存在することを明らかにする。まず、LLMベースのコード臭い浄化ツールであるSmellCCを提案し、コードの臭いを自動的にリファクタリングし除去します。コードリファクタリングの正確性を評価するため,機能テストのためのCodeSearchNet-Pythonベンチマークから得られた50のリポジトリのテストセットを構築した。次に、キュレートされた臭いを清浄したデータセットを2つのLSM(DeepSeek-V2とQwen-Coder)に微調整して、高品質なコードを生成する可能性を探る。第三に、コードの臭いが下流の2つのタスク、すなわちコード補完とコード検索に与える影響について検討する。最後に、我々の発見から、ソフトウェア工学研究者や業界実践者に対して、いくつかの実用的な意味合いを導き出します。

論文の概要: Clean Code, Better Models: Enhancing LLM Performance with Smell-Cleaned Dataset

関連論文リスト