Enhancing Code Intelligence Tasks with ChatGPT
- URL: http://arxiv.org/abs/2312.15202v1
- Date: Sat, 23 Dec 2023 09:01:08 GMT
- Title: Enhancing Code Intelligence Tasks with ChatGPT
- Authors: Kang Yang, Xinjun Mao, Shangwen Wang, Tanghaoran Zhang, Bo Lin, Yanlin
Wang, Yihao Qin, Zhang Zhang, Xiaoguang Mao
- Abstract summary: ChatGPT-generated comments demonstrate superior semantic consistency with the code compared to human references.
We rebuild the widely used dataset, CodeSearchNet, with ChatGPT-generated comments.
Results show that the model pre-trained by ChatGPT-enhanced data outperforms its counterpart on code summarization, code generation, and code translation tasks.
- Score: 17.712126698173535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained code models have emerged as crucial tools in various code
intelligence tasks. However, their effectiveness depends on the quality of the
pre-training dataset, particularly the human reference comments, which serve as
a bridge between the programming language and natural language. One significant
challenge is that such comments can become inconsistent with the corresponding
code as the software evolves. This discrepancy can lead to suboptimal training
of the models, decreasing their performances. LLMs have demonstrated superior
capabilities in generating high-quality code comments. In light of that, we try
to tackle the quality issue of the dataset by harnessing the power of LLMs.
Specifically, we raise the question: Can we rebuild the pre-training dataset by
substituting the original comments with LLM-generated ones for more effective
pre-trained code models? To answer the question, we first conduct a
comprehensive evaluation to compare ChatGPT-generated comments with human
reference comments. As existing reference-based metrics treat the reference
comments as gold standards, we introduce two auxiliary tasks as novel
reference-free metrics to assess the quality of comments, i.e., code-comment
inconsistency detection and code search. Experimental results show that
ChatGPT-generated comments demonstrate superior semantic consistency with the
code compared to human references, indicating the potential of utilizing
ChatGPT to enhance the quality of the pre-training dataset. We rebuilt the
widely used dataset, CodeSearchNet, with ChatGPT-generated comments. Subsequent
experiments involve re-pre-training the CodeT5 with our refined
dataset.Evaluation results on four generation tasks and one understanding code
intelligence tasks show that the model pre-trained by ChatGPT-enhanced data
outperforms its counterpart on code summarization, code generation, and code
translation tasks.
Related papers
- Automating Patch Set Generation from Code Review Comments Using Large Language Models [2.045040820541428]
We provide code contexts to five popular Large Language Models (LLMs)
We obtain the suggested code-changes (patch sets) derived from real-world code-review comments.
The performance of each model is meticulously assessed by comparing their generated patch sets against the historical data of human-generated patch-sets.
arXiv Detail & Related papers (2024-04-10T02:46:08Z) - Code Needs Comments: Enhancing Code LLMs with Comment Augmentation [91.52444946362547]
We introduce a novel data augmentation method that generates comments for existing code, coupled with a data filtering strategy that filters out code data poorly correlated with natural language.
We conducted experiments on three code-focused Large Language Models and observed consistent improvements in performance on two widely-used programming skill benchmarks.
arXiv Detail & Related papers (2024-02-20T13:56:38Z) - LLM-Assisted Code Cleaning For Training Accurate Code Generators [53.087019724256606]
We investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system.
We build a novel data-cleaning pipeline that uses these principles to transform existing programs.
We evaluate our approach on two challenging algorithmic code generation benchmarks and find that fine-tuning CodeLLaMa-7B improves the performance by up to 30% compared to fine-tuning on the original dataset.
arXiv Detail & Related papers (2023-11-25T02:45:50Z) - Assessing the Promise and Pitfalls of ChatGPT for Automated Code
Generation [2.0400340435492272]
This paper presents a comprehensive evaluation of the code generation capabilities of ChatGPT, a prominent large language model.
A dataset of 131 code-generation prompts across 5 categories was curated to enable robust analysis.
Code solutions were generated by both ChatGPT and humans for all prompts, resulting in 262 code samples.
arXiv Detail & Related papers (2023-11-05T12:56:40Z) - Exploring the Potential of ChatGPT in Automated Code Refinement: An
Empirical Study [0.0]
ChatGPT, a cutting-edge language model, has demonstrated impressive performance in various natural language processing tasks.
We conduct the first empirical study to understand the capabilities of ChatGPT in code review tasks.
Our results show that ChatGPT achieves higher EM and BLEU scores of 22.78 and 76.44 respectively, while the state-of-the-art method achieves only 15.50 and 62.88 on a high-quality code review dataset.
arXiv Detail & Related papers (2023-09-15T07:41:33Z) - Stochastic Code Generation [1.7205106391379026]
Large language models pre-trained for code generation can generate high-quality short code but often struggle with generating coherent long code.
This issue is also observed in language modeling for long text generation.
In this study, we investigate whether this technique can be applied to code generation to improve coherence.
arXiv Detail & Related papers (2023-04-14T00:01:05Z) - ReCode: Robustness Evaluation of Code Generation Models [90.10436771217243]
We propose ReCode, a comprehensive robustness evaluation benchmark for code generation models.
We customize over 30 transformations specifically for code on docstrings, function and variable names, code syntax, and code format.
With human annotators, we verified that over 90% of the perturbed prompts do not alter the semantic meaning of the original prompt.
arXiv Detail & Related papers (2022-12-20T14:11:31Z) - Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task.
We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions.
We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z) - CodeExp: Explanatory Code Document Generation [94.43677536210465]
Existing code-to-text generation models produce only high-level summaries of code.
We conduct a human study to identify the criteria for high-quality explanatory docstring for code.
We present a multi-stage fine-tuning strategy and baseline models for the task.
arXiv Detail & Related papers (2022-11-25T18:05:44Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.