Investigating the Efficacy of Large Language Models for Code Clone
Detection
- URL: http://arxiv.org/abs/2401.13802v3
- Date: Tue, 30 Jan 2024 06:10:29 GMT
- Title: Investigating the Efficacy of Large Language Models for Code Clone
Detection
- Authors: Mohamad Khajezade, Jie JW Wu, Fatemeh Hendijani Fard, Gema
Rodr\'iguez-P\'erez, Mohamed Sami Shehata
- Abstract summary: Large Language Models (LLMs) have demonstrated remarkable success in various natural language processing and software engineering tasks.
In this study, we investigated the applicability of LLMs for Code Clone Detection (CCD), a non-generative task.
ChatGPT surpasses the baselines in cross-language CCD attaining an F1-score of 0.877 and achieves comparable performance to fully fine-tuned models for mono-lingual CCD.
- Score: 2.0749231618270803
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have demonstrated remarkable success in various
natural language processing and software engineering tasks, such as code
generation. The LLMs are mainly utilized in the prompt-based zero/few-shot
paradigm to guide the model in accomplishing the task. GPT-based models are one
of the popular ones studied for tasks such as code comment generation or test
generation. These tasks are `generative' tasks. However, there is limited
research on the usage of LLMs for `non-generative' tasks such as classification
using the prompt-based paradigm. In this preliminary exploratory study, we
investigated the applicability of LLMs for Code Clone Detection (CCD), a
non-generative task. By building a mono-lingual and cross-lingual CCD dataset
derived from CodeNet, we first investigated two different prompts using ChatGPT
to detect Type-4 code clones in Java-Java and Java-Ruby pairs in a zero-shot
setting. We then conducted an analysis to understand the strengths and
weaknesses of ChatGPT in CCD. ChatGPT surpasses the baselines in cross-language
CCD attaining an F1-score of 0.877 and achieves comparable performance to fully
fine-tuned models for mono-lingual CCD, with an F1-score of 0.878. Also, the
prompt and the difficulty level of the problems has an impact on the
performance of ChatGPT. Finally we provide insights and future directions based
on our initial analysis
Related papers
- Large Language Models for cross-language code clone detection [3.5202378300682162]
Cross-lingual code clone detection has gained traction with the software engineering community.
Inspired by the significant advances in machine learning, this paper revisits cross-lingual code clone detection.
arXiv Detail & Related papers (2024-08-08T12:57:14Z) - Adaptable Logical Control for Large Language Models [68.27725600175013]
Ctrl-G is an adaptable framework that facilitates tractable and flexible control of model generation at inference time.
We show that Ctrl-G, when applied to a TULU2-7B model, outperforms GPT3.5 and GPT4 on the task of interactive text editing.
arXiv Detail & Related papers (2024-06-19T23:47:59Z) - AdaCCD: Adaptive Semantic Contrasts Discovery Based Cross Lingual
Adaptation for Code Clone Detection [69.79627042058048]
AdaCCD is a novel cross-lingual adaptation method that can detect cloned codes in a new language without annotations in that language.
We evaluate the cross-lingual adaptation results of AdaCCD by constructing a multilingual code clone detection benchmark consisting of 5 programming languages.
arXiv Detail & Related papers (2023-11-13T12:20:48Z) - Chatbots Are Not Reliable Text Annotators [0.0]
ChatGPT is a closed-source product which has major drawbacks with regards to transparency, cost, and data protection.
Recent advances in open-source (OS) large language models (LLMs) offer alternatives which remedy these challenges.
arXiv Detail & Related papers (2023-11-09T22:28:14Z) - Stay on topic with Classifier-Free Guidance [57.28934343207042]
We show that CFG can be used broadly as an inference-time technique in pure language modeling.
We show that CFG improves the performance of Pythia, GPT-2 and LLaMA-family models across an array of tasks.
arXiv Detail & Related papers (2023-06-30T17:07:02Z) - ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large
Language Models in Multilingual Learning [70.57126720079971]
Large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP)
This paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources.
Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages.
arXiv Detail & Related papers (2023-04-12T05:08:52Z) - Comparative Analysis of CHATGPT and the evolution of language models [0.0]
This paper highlights the prevailing ideas in NLP, including machine translation, machine summarization, question-answering, and language generation.
A strategy for validating the arguments and results of ChatGPT is presented summarily as an example of safe, large-scale adoption of Large Language Models.
arXiv Detail & Related papers (2023-03-28T03:11:28Z) - A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on
Reasoning, Hallucination, and Interactivity [79.12003701981092]
We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks.
We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset.
ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning.
arXiv Detail & Related papers (2023-02-08T12:35:34Z) - Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot.
Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community.
It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z) - Rationale-Guided Few-Shot Classification to Detect Abusive Language [5.977278650516324]
We propose RGFS (Rationale-Guided Few-Shot Classification) for abusive language detection.
We introduce two rationale-integrated BERT-based architectures (the RGFS models) and evaluate our systems over five different abusive language datasets.
arXiv Detail & Related papers (2022-11-30T14:47:14Z) - Evaluating few shot and Contrastive learning Methods for Code Clone
Detection [5.1623866691702744]
Code Clone Detection is a software engineering task that is used for plagiarism detection, code search, and code comprehension.
Deep learning-based models have achieved an F1 score (a metric used to assess classifiers) of $sim$95% on the CodeXGLUE benchmark.
No previous study evaluates the generalizability of these models where a limited amount of annotated data is available.
arXiv Detail & Related papers (2022-04-15T15:01:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.