Empirical Analysis on Effectiveness of NLP Methods for Predicting Code
Smell
- URL: http://arxiv.org/abs/2108.04656v1
- Date: Sun, 8 Aug 2021 12:10:20 GMT
- Title: Empirical Analysis on Effectiveness of NLP Methods for Predicting Code
Smell
- Authors: Himanshu Gupta, Abhiram Anand Gulanikar, Lov Kumar and Lalita Bhanu
Murthy Neti
- Abstract summary: A code smell is a surface indicator of an inherent problem in the system.
We use three Extreme learning machine kernels over 629 packages to identify eight code smells.
Our findings indicate that the radial basis functional kernel performs best out of the three kernel methods with a mean accuracy of 98.52.
- Score: 3.2973778921083357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A code smell is a surface indicator of an inherent problem in the system,
most often due to deviation from standard coding practices on the developers
part during the development phase. Studies observe that code smells made the
code more susceptible to call for modifications and corrections than code that
did not contain code smells. Restructuring the code at the early stage of
development saves the exponentially increasing amount of effort it would
require to address the issues stemming from the presence of these code smells.
Instead of using traditional features to detect code smells, we use user
comments to manually construct features to predict code smells. We use three
Extreme learning machine kernels over 629 packages to identify eight code
smells by leveraging feature engineering aspects and using sampling techniques.
Our findings indicate that the radial basis functional kernel performs best out
of the three kernel methods with a mean accuracy of 98.52.
Related papers
- EnseSmells: Deep ensemble and programming language models for automated code smells detection [3.974095344344234]
A smell in software source code denotes an indication of suboptimal design and implementation decisions.
This paper proposes a novel approach to code smell detection, constructing a deep learning architecture that places importance on the fusion of structural features and statistical semantics.
arXiv Detail & Related papers (2025-02-07T15:35:19Z) - How Propense Are Large Language Models at Producing Code Smells? A Benchmarking Study [45.126233498200534]
We introduce CodeSmellEval, a benchmark designed to evaluate the propensity of Large Language Models for generating code smells.
Our benchmark includes a novel metric: Propensity Smelly Score (PSC), and a curated dataset of method-level code smells: CodeSmellData.
To demonstrate the use of CodeSmellEval, we conducted a case study with two state-of-the-art LLMs, CodeLlama and Mistral.
arXiv Detail & Related papers (2024-12-25T21:56:35Z) - Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub.
83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z) - Factor Graph Optimization of Error-Correcting Codes for Belief Propagation Decoding [62.25533750469467]
Low-Density Parity-Check (LDPC) codes possess several advantages over other families of codes.
The proposed approach is shown to outperform the decoding performance of existing popular codes by orders of magnitude.
arXiv Detail & Related papers (2024-06-09T12:08:56Z) - Learning Linear Block Error Correction Codes [62.25533750469467]
We propose for the first time a unified encoder-decoder training of binary linear block codes.
We also propose a novel Transformer model in which the self-attention masking is performed in a differentiable fashion for the efficient backpropagation of the code gradient.
arXiv Detail & Related papers (2024-05-07T06:47:12Z) - Zero-Shot Detection of Machine-Generated Codes [83.0342513054389]
This work proposes a training-free approach for the detection of LLMs-generated codes.
We find that existing training-based or zero-shot text detectors are ineffective in detecting code.
Our method exhibits robustness against revision attacks and generalizes well to Java codes.
arXiv Detail & Related papers (2023-10-08T10:08:21Z) - Augmenting Diffs With Runtime Information [53.22981451758425]
Collector-Sahab is a tool that augments code diffs with runtime difference information.
We run Collector-Sahab on 584 code diffs for Defects4J bugs and find it successfully augments the code diff for 95% (555/584) of them.
arXiv Detail & Related papers (2022-12-20T16:33:51Z) - An Empirical Study on Predictability of Software Code Smell Using Deep
Learning Models [3.2973778921083357]
Code smell is a surface indication of something tainted but in terms of software writing practices.
Recent studies have observed that codes having code smells are often prone to a higher probability of change in the software development cycle.
We developed code smell prediction models with the help of features extracted from source code to predict eight types of code smell.
arXiv Detail & Related papers (2021-08-08T12:36:23Z) - Deep Learning to Ternary Hash Codes by Continuation [8.920717493647121]
We propose to jointly learn the features with the codes by appending a smoothed function to the networks.
Experiments show that the generated codes indeed could achieve higher retrieval accuracy.
arXiv Detail & Related papers (2021-07-16T16:02:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.