An Empirical Study on Predictability of Software Code Smell Using Deep
Learning Models
- URL: http://arxiv.org/abs/2108.04659v1
- Date: Sun, 8 Aug 2021 12:36:23 GMT
- Title: An Empirical Study on Predictability of Software Code Smell Using Deep
Learning Models
- Authors: Himanshu Gupta, Tanmay G. Kulkarni, Lov Kumar, Lalita Bhanu Murthy
Neti and Aneesh Krishna
- Abstract summary: Code smell is a surface indication of something tainted but in terms of software writing practices.
Recent studies have observed that codes having code smells are often prone to a higher probability of change in the software development cycle.
We developed code smell prediction models with the help of features extracted from source code to predict eight types of code smell.
- Score: 3.2973778921083357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code Smell, similar to a bad smell, is a surface indication of something
tainted but in terms of software writing practices. This metric is an
indication of a deeper problem lies within the code and is associated with an
issue which is prominent to experienced software developers with acceptable
coding practices. Recent studies have often observed that codes having code
smells are often prone to a higher probability of change in the software
development cycle. In this paper, we developed code smell prediction models
with the help of features extracted from source code to predict eight types of
code smell. Our work also presents the application of data sampling techniques
to handle class imbalance problem and feature selection techniques to find
relevant feature sets. Previous studies had made use of techniques such as
Naive - Bayes and Random forest but had not explored deep learning methods to
predict code smell. A total of 576 distinct Deep Learning models were trained
using the features and datasets mentioned above. The study concluded that the
deep learning models which used data from Synthetic Minority Oversampling
Technique gave better results in terms of accuracy, AUC with the accuracy of
some models improving from 88.47 to 96.84.
Related papers
- EnseSmells: Deep ensemble and programming language models for automated code smells detection [3.974095344344234]
A smell in software source code denotes an indication of suboptimal design and implementation decisions.
This paper proposes a novel approach to code smell detection, constructing a deep learning architecture that places importance on the fusion of structural features and statistical semantics.
arXiv Detail & Related papers (2025-02-07T15:35:19Z) - ChatGPT Code Detection: Techniques for Uncovering the Source of Code [0.0]
We use advanced classification techniques to differentiate between code written by humans and that generated by ChatGPT.
We employ a new approach that combines powerful embedding features (black-box) with supervised learning algorithms.
We show that untrained humans solve the same task not better than random guessing.
arXiv Detail & Related papers (2024-05-24T12:56:18Z) - Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach [66.51005288743153]
We investigate the legal and ethical issues of current neural code completion models.
We tailor a membership inference approach (termed CodeMI) that was originally crafted for classification tasks.
We evaluate the effectiveness of this adapted approach across a diverse array of neural code completion models.
arXiv Detail & Related papers (2024-04-22T15:54:53Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Rethinking Negative Pairs in Code Search [56.23857828689406]
We propose a simple yet effective Soft-InfoNCE loss that inserts weight terms into InfoNCE.
We analyze the effects of Soft-InfoNCE on controlling the distribution of learnt code representations and on deducing a more precise mutual information estimation.
arXiv Detail & Related papers (2023-10-12T06:32:42Z) - Enriching Source Code with Contextual Data for Code Completion Models:
An Empirical Study [4.438873396405334]
We aim to answer whether making code easier to understand through using contextual data improves the performance of pre-trained code language models for the task of code completion.
For comments, we find that the models perform better in the presence of multi-line comments.
arXiv Detail & Related papers (2023-04-24T17:09:14Z) - Towards Efficient Fine-tuning of Pre-trained Code Models: An
Experimental Study and Beyond [52.656743602538825]
Fine-tuning pre-trained code models incurs a large computational cost.
We conduct an experimental study to explore what happens to layer-wise pre-trained representations and their encoded code knowledge during fine-tuning.
We propose Telly to efficiently fine-tune pre-trained code models via layer freezing.
arXiv Detail & Related papers (2023-04-11T13:34:13Z) - PEOPL: Characterizing Privately Encoded Open Datasets with Public Labels [59.66777287810985]
We introduce information-theoretic scores for privacy and utility, which quantify the average performance of an unfaithful user.
We then theoretically characterize primitives in building families of encoding schemes that motivate the use of random deep neural networks.
arXiv Detail & Related papers (2023-03-31T18:03:53Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - What do pre-trained code models know about code? [9.60966128833701]
We use diagnostic tasks called probes to investigate pre-trained code models.
BERT (pre-trained on English), CodeBERT and CodeBERTa (pre-trained on source code, and natural language documentation), and GraphCodeBERT (pre-trained on source code with dataflow) are investigated.
arXiv Detail & Related papers (2021-08-25T16:20:17Z) - Empirical Analysis on Effectiveness of NLP Methods for Predicting Code
Smell [3.2973778921083357]
A code smell is a surface indicator of an inherent problem in the system.
We use three Extreme learning machine kernels over 629 packages to identify eight code smells.
Our findings indicate that the radial basis functional kernel performs best out of the three kernel methods with a mean accuracy of 98.52.
arXiv Detail & Related papers (2021-08-08T12:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.