On the Use of Deep Learning Models for Semantic Clone Detection
- URL: http://arxiv.org/abs/2412.14739v1
- Date: Thu, 19 Dec 2024 11:15:02 GMT
- Title: On the Use of Deep Learning Models for Semantic Clone Detection
- Authors: Subroto Nag Pinku, Debajyoti Mondal, Chanchal K. Roy,
- Abstract summary: We propose a multi-step evaluation approach for five state-of-the-art clone detection models leveraging existing benchmark datasets.
Specifically, we examine three highly-performing single-language models (ASTNN, GMN, CodeBERT) on BigCloneBench, SemanticCloneBench, and GPTCloneBench.
While single-language models show high F1 scores for BigCloneBench, their performance on SemanticCloneBench varies (up to 20%)
Interestingly, the cross-language model (C4) shows superior performance (around 7%) on SemanticCloneBench over other models.
- Score: 4.796947520072581
- License:
- Abstract: Detecting and tracking code clones can ease various software development and maintenance tasks when changes in a code fragment should be propagated over all its copies. Several deep learning-based clone detection models have appeared in the literature for detecting syntactic and semantic clones, widely evaluated with the BigCloneBench dataset. However, class imbalance and the small number of semantic clones make BigCloneBench less ideal for interpreting model performance. Researchers also use other datasets such as GoogleCodeJam, OJClone, and SemanticCloneBench to understand model generalizability. To overcome the limitations of existing datasets, the GPT-assisted semantic and cross-language clone dataset GPTCloneBench has been released. However, how these models compare across datasets remains unclear. In this paper, we propose a multi-step evaluation approach for five state-of-the-art clone detection models leveraging existing benchmark datasets, including GPTCloneBench, and using mutation operators to study model ability. Specifically, we examine three highly-performing single-language models (ASTNN, GMN, CodeBERT) on BigCloneBench, SemanticCloneBench, and GPTCloneBench, testing their robustness with mutation operations. Additionally, we compare them against cross-language models (C4, CLCDSA) known for detecting semantic clones. While single-language models show high F1 scores for BigCloneBench, their performance on SemanticCloneBench varies (up to 20%). Interestingly, the cross-language model (C4) shows superior performance (around 7%) on SemanticCloneBench over other models and performs similarly on BigCloneBench and GPTCloneBench. On mutation-based datasets, C4 has more robust performance (less than 1% difference) compared to single-language models, which show high variability.
Related papers
- Large Language Models for cross-language code clone detection [3.5202378300682162]
Cross-lingual code clone detection has gained traction within the software engineering community.
Inspired by the significant advances in machine learning, this paper revisits cross-lingual code clone detection.
We evaluate the performance of five (05) Large Language Models (LLMs) and eight prompts (08) for the identification of cross-lingual code clones.
arXiv Detail & Related papers (2024-08-08T12:57:14Z) - Assessing the Code Clone Detection Capability of Large Language Models [0.0]
The evaluation involves testing the models on a variety of code pairs of different clone types and levels of similarity.
Findings indicate that GPT-4 consistently surpasses GPT-3.5 across all clone types.
arXiv Detail & Related papers (2024-07-02T16:20:44Z) - Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - Using Ensemble Inference to Improve Recall of Clone Detection [0.0]
Large-scale source-code clone detection is a challenging task.
We employ four state-of-the-art neural network models and evaluate them individually/in combination.
The results, on an illustrative dataset of approximately 500K lines of C/C++ code, suggest ensemble inference outperforms individual models in all trialled cases.
arXiv Detail & Related papers (2024-02-12T09:44:59Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Unveiling the potential of large language models in generating semantic
and cross-language clones [8.791710193028905]
OpenAI's GPT model has potential in such clone generation as GPT is used for text generation.
In the realm of semantic clones, GPT-3 attains an impressive accuracy of 62.14% and 0.55 BLEU score, achieved through few-shot prompt engineering.
arXiv Detail & Related papers (2023-09-12T17:40:49Z) - GPTCloneBench: A comprehensive benchmark of semantic clones and
cross-language clones using GPT-3 model and SemanticCloneBench [1.8687918300580921]
We present a comprehensive semantic clone and cross-language clone benchmark, GPTCloneBench by exploiting SemanticCloneBench and OpenAI's GPT-3 model.
From 79,928 clone pairs of GPT-3 output, we created a benchmark with 37,149 true semantic clone pairs, 19,288 false semantic pairs(Type-1/Type-2), and 20,770 cross-language clones across four languages (Java, C, C#, and Python)
arXiv Detail & Related papers (2023-08-26T21:50:34Z) - ZC3: Zero-Shot Cross-Language Code Clone Detection [79.53514630357876]
We propose a novel method named ZC3 for Zero-shot Cross-language Code Clone detection.
ZC3 designs the contrastive snippet prediction to form an isomorphic representation space among different programming languages.
Based on this, ZC3 exploits domain-aware learning and cycle consistency learning to generate representations that are aligned among different languages are diacritical for different types of clones.
arXiv Detail & Related papers (2023-08-26T03:48:10Z) - The CLEAR Benchmark: Continual LEArning on Real-World Imagery [77.98377088698984]
Continual learning (CL) is widely regarded as crucial challenge for lifelong AI.
We introduce CLEAR, the first continual image classification benchmark dataset with a natural temporal evolution of visual concepts.
We find that a simple unsupervised pre-training step can already boost state-of-the-art CL algorithms.
arXiv Detail & Related papers (2022-01-17T09:09:09Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - Semantic Clone Detection via Probabilistic Software Modeling [69.43451204725324]
This article contributes a semantic clone detection approach that detects clones that have 0% syntactic similarity.
We present SCD-PSM as a stable and precise solution to semantic clone detection.
arXiv Detail & Related papers (2020-08-11T17:54:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.