Towards Automated Detection of Inline Code Comment Smells
- URL: http://arxiv.org/abs/2504.18956v1
- Date: Sat, 26 Apr 2025 15:38:14 GMT
- Title: Towards Automated Detection of Inline Code Comment Smells
- Authors: Ipek Oztas, U Boran Torun, Eray Tüzün,
- Abstract summary: We aim to automatically detect and classify inline code comment smells through machine learning (ML) models and a large language model (LLM)<n>In parallel, we trained and tested seven different machine learning algorithms on the augmented dataset to compare their classification performance against GPT 4.<n>The performance of models, particularly Random Forest, which achieved an overall accuracy of 69 percent, establishes a solid baseline for future research in this domain.
- Score: 2.2134505920972547
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Code comments are important in software development because they directly influence software maintainability and overall quality. Bad practices of code comments lead to code comment smells, negatively impacting software maintenance. Recent research has been conducted on classifying inline code comment smells, yet automatically detecting these still remains a challenge. We aim to automatically detect and classify inline code comment smells through machine learning (ML) models and a large language model (LLM) to determine how accurately each smell type can be detected. We enhanced a previously labeled dataset, where comments are labeled according to a determined taxonomy, by augmenting it with additional code segments and their associated comments. GPT 4, a large language model, was used to classify code comment smells on both the original and augmented datasets to evaluate its performance. In parallel, we trained and tested seven different machine learning algorithms on the augmented dataset to compare their classification performance against GPT 4. The performance of models, particularly Random Forest, which achieved an overall accuracy of 69 percent, along with Gradient Boosting and Logistic Regression, each achieving 66 percent and 65 percent, respectively, establishes a solid baseline for future research in this domain. The Random Forest model outperformed all other ML models, by achieving the highest Matthews Correlation Coefficient (MCC) score of 0.44. The augmented dataset improved the overall classification accuracy of the GPT 4 model predictions from 34 percent to 55 percent. This study contributes to software maintainability by exploring the automatic detection and classification of inline code comment smells. We have made our augmented dataset and code artifacts available online, offering a valuable resource for developing automated comment smell detection tools.
Related papers
- Benchmarking LLM for Code Smells Detection: OpenAI GPT-4.0 vs DeepSeek-V3 [0.0]
This study introduces a structured methodology and evaluation matrix to tackle this issue.<n>The dataset spans four prominent programming languages Java, Python, JavaScript, and C++.<n>We benchmark two state of the art LLMs, OpenAI GPT 4.0 and DeepSeek-V3, using precision, recall, and F1 score as evaluation metrics.
arXiv Detail & Related papers (2025-04-22T16:44:39Z) - Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks.<n>However, improvement is plateauing due to the exhaustion of readily available high-quality data.<n>We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z) - EnseSmells: Deep ensemble and programming language models for automated code smells detection [3.974095344344234]
A smell in software source code denotes an indication of suboptimal design and implementation decisions.<n>This paper proposes a novel approach to code smell detection, constructing a deep learning architecture that places importance on the fusion of structural features and statistical semantics.
arXiv Detail & Related papers (2025-02-07T15:35:19Z) - Too Noisy To Learn: Enhancing Data Quality for Code Review Comment Generation [2.990411348977783]
Open-source datasets are used to train neural models for automated code review tasks.
These datasets contain a significant amount of noisy comments that persist despite cleaning methods.
We propose a novel approach using large language models (LLMs) to further clean these datasets.
arXiv Detail & Related papers (2025-02-04T22:48:58Z) - How Propense Are Large Language Models at Producing Code Smells? A Benchmarking Study [45.126233498200534]
We introduce CodeSmellEval, a benchmark designed to evaluate the propensity of Large Language Models for generating code smells.<n>Our benchmark includes a novel metric: Propensity Smelly Score (PSC), and a curated dataset of method-level code smells: CodeSmellData.<n>To demonstrate the use of CodeSmellEval, we conducted a case study with two state-of-the-art LLMs, CodeLlama and Mistral.
arXiv Detail & Related papers (2024-12-25T21:56:35Z) - Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - PEOPL: Characterizing Privately Encoded Open Datasets with Public Labels [59.66777287810985]
We introduce information-theoretic scores for privacy and utility, which quantify the average performance of an unfaithful user.
We then theoretically characterize primitives in building families of encoding schemes that motivate the use of random deep neural networks.
arXiv Detail & Related papers (2023-03-31T18:03:53Z) - An Empirical Study on Predictability of Software Code Smell Using Deep
Learning Models [3.2973778921083357]
Code smell is a surface indication of something tainted but in terms of software writing practices.
Recent studies have observed that codes having code smells are often prone to a higher probability of change in the software development cycle.
We developed code smell prediction models with the help of features extracted from source code to predict eight types of code smell.
arXiv Detail & Related papers (2021-08-08T12:36:23Z) - SCARF: Self-Supervised Contrastive Learning using Random Feature
Corruption [72.35532598131176]
We propose SCARF, a technique for contrastive learning, where views are formed by corrupting a random subset of features.
We show that SCARF complements existing strategies and outperforms alternatives like autoencoders.
arXiv Detail & Related papers (2021-06-29T08:08:33Z) - Auto-Encoding Twin-Bottleneck Hashing [141.5378966676885]
This paper proposes an efficient and adaptive code-driven graph.
It is updated by decoding in the context of an auto-encoder.
Experiments on benchmarked datasets clearly show the superiority of our framework over the state-of-the-art hashing methods.
arXiv Detail & Related papers (2020-02-27T05:58:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.