Related papers: Empirical Analysis on Effectiveness of NLP Methods for Predicting Code Smell

Empirical Analysis on Effectiveness of NLP Methods for Predicting Code Smell

URL: http://arxiv.org/abs/2108.04656v1
Date: Sun, 8 Aug 2021 12:10:20 GMT
Title: Empirical Analysis on Effectiveness of NLP Methods for Predicting Code Smell
Authors: Himanshu Gupta, Abhiram Anand Gulanikar, Lov Kumar and Lalita Bhanu Murthy Neti
Abstract summary: A code smell is a surface indicator of an inherent problem in the system. We use three Extreme learning machine kernels over 629 packages to identify eight code smells. Our findings indicate that the radial basis functional kernel performs best out of the three kernel methods with a mean accuracy of 98.52.
Score: 3.2973778921083357
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A code smell is a surface indicator of an inherent problem in the system, most often due to deviation from standard coding practices on the developers part during the development phase. Studies observe that code smells made the code more susceptible to call for modifications and corrections than code that did not contain code smells. Restructuring the code at the early stage of development saves the exponentially increasing amount of effort it would require to address the issues stemming from the presence of these code smells. Instead of using traditional features to detect code smells, we use user comments to manually construct features to predict code smells. We use three Extreme learning machine kernels over 629 packages to identify eight code smells by leveraging feature engineering aspects and using sampling techniques. Our findings indicate that the radial basis functional kernel performs best out of the three kernel methods with a mean accuracy of 98.52.

Related papers

EnseSmells: Deep ensemble and programming language models for automated code smells detection [3.974095344344234]
A smell in software source code denotes an indication of suboptimal design and implementation decisions. This paper proposes a novel approach to code smell detection, constructing a deep learning architecture that places importance on the fusion of structural features and statistical semantics.
arXiv Detail & Related papers (2025-02-07T15:35:19Z)
How Propense Are Large Language Models at Producing Code Smells? A Benchmarking Study [45.126233498200534]
We introduce CodeSmellEval, a benchmark designed to evaluate the propensity of Large Language Models for generating code smells. Our benchmark includes a novel metric: Propensity Smelly Score (PSC), and a curated dataset of method-level code smells: CodeSmellData. To demonstrate the use of CodeSmellEval, we conducted a case study with two state-of-the-art LLMs, CodeLlama and Mistral.
arXiv Detail & Related papers (2024-12-25T21:56:35Z)
Fault Localization from the Semantic Code Search Perspective [8.287095430092835]
We propose a fault localizer that decomposes the Fault localization task into two steps: query generation and fault retrieval. We show that CosFL successfully localizes 324 bugs within Top-1, which significantly outperforms the state-of-the-art approaches by 26.6%-57.3%.
arXiv Detail & Related papers (2024-11-26T08:52:13Z)
Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub. 83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z)
Factor Graph Optimization of Error-Correcting Codes for Belief Propagation Decoding [62.25533750469467]
Low-Density Parity-Check (LDPC) codes possess several advantages over other families of codes. The proposed approach is shown to outperform the decoding performance of existing popular codes by orders of magnitude.
arXiv Detail & Related papers (2024-06-09T12:08:56Z)
Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting [78.48355455324688]
We propose a novel zero-shot synthetic code detector based on the similarity between the code and its rewritten variants. Our results demonstrate a notable enhancement over existing synthetic content detectors designed for general texts.
arXiv Detail & Related papers (2024-05-25T08:57:28Z)
Learning Linear Block Error Correction Codes [62.25533750469467]
We propose for the first time a unified encoder-decoder training of binary linear block codes. We also propose a novel Transformer model in which the self-attention masking is performed in a differentiable fashion for the efficient backpropagation of the code gradient.
arXiv Detail & Related papers (2024-05-07T06:47:12Z)
Zero-Shot Detection of Machine-Generated Codes [83.0342513054389]
This work proposes a training-free approach for the detection of LLMs-generated codes. We find that existing training-based or zero-shot text detectors are ineffective in detecting code. Our method exhibits robustness against revision attacks and generalizes well to Java codes.
arXiv Detail & Related papers (2023-10-08T10:08:21Z)
Augmenting Diffs With Runtime Information [53.22981451758425]
Collector-Sahab is a tool that augments code diffs with runtime difference information. We run Collector-Sahab on 584 code diffs for Defects4J bugs and find it successfully augments the code diff for 95% (555/584) of them.
arXiv Detail & Related papers (2022-12-20T16:33:51Z)
DeSkew-LSH based Code-to-Code Recommendation Engine [3.7011129410662558]
We present emphSenatus, a new code-to-code recommendation engine for machine learning on source code. At the core of Senatus is emphDe-Skew LSH, a new locality sensitive hashing algorithm that indexes the data for fast (sub-linear time) retrieval. We show Senatus improves performance by 6.7% F1 and query time 16x is faster compared to Facebook Aroma on the task of code-to-code recommendation.
arXiv Detail & Related papers (2021-11-05T16:56:28Z)
An Empirical Study on Predictability of Software Code Smell Using Deep Learning Models [3.2973778921083357]
Code smell is a surface indication of something tainted but in terms of software writing practices. Recent studies have observed that codes having code smells are often prone to a higher probability of change in the software development cycle. We developed code smell prediction models with the help of features extracted from source code to predict eight types of code smell.
arXiv Detail & Related papers (2021-08-08T12:36:23Z)
Deep Learning to Ternary Hash Codes by Continuation [8.920717493647121]
We propose to jointly learn the features with the codes by appending a smoothed function to the networks. Experiments show that the generated codes indeed could achieve higher retrieval accuracy.
arXiv Detail & Related papers (2021-07-16T16:02:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.