CCTEST: Testing and Repairing Code Completion Systems
- URL: http://arxiv.org/abs/2208.08289v3
- Date: Mon, 8 May 2023 13:01:08 GMT
- Title: CCTEST: Testing and Repairing Code Completion Systems
- Authors: Zongjie Li, Chaozheng Wang, Zhibo Liu, Haoxuan Wang, Dong Chen, Shuai
Wang, Cuiyun Gao
- Abstract summary: This research proposes CCTEST, a framework to test and repair code completion systems in blackbox settings.
With repairing, we show that the accuracy of code completion systems is notably increased by 40% and 67% with respect to BLEU score and Levenshtein edit similarity.
- Score: 27.176179982086804
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code completion, a highly valuable topic in the software development domain,
has been increasingly promoted for use by recent advances in large language
models (LLMs). To date, visible LLM-based code completion frameworks such as
GitHub Copilot and GPT are trained using deep learning over vast quantities of
unstructured text and open source code. As the paramount component and the
cornerstone in daily programming tasks, code completion has largely boosted
professionals' efficiency in building real-world software systems. In contrast
to this flourishing market, we find that code completion systems often output
suspicious results, and to date, an automated testing and enhancement framework
for code completion systems is not available. This research proposes CCTEST, a
framework to test and repair code completion systems in blackbox settings.
CCTEST features a set of novel mutation strategies, namely program
structure-correlated (PSC) mutations, to generate mutated code completion
inputs. Then, it detects inconsistent outputs, representing possibly erroneous
cases, from all the completed code cases. Moreover, CCTEST repairs the code
completion outputs by selecting the output that mostly reflects the "average"
appearance of all output cases, as the final output of the code completion
systems. We detected a total of 33,540 inputs (with a true positive rate of
86%) that can trigger erroneous cases from eight popular LLM-based code
completion systems. With repairing, we show that the accuracy of code
completion systems is notably increased by 40% and 67% with respect to BLEU
score and Levenshtein edit similarity.
Related papers
- Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - Validating LLM-Generated Programs with Metamorphic Prompt Testing [8.785973653167112]
Large Language Models (LLMs) are increasingly integrated into the software development lifecycle.
This paper proposes a novel solution called metamorphic prompt testing to address these challenges.
Our evaluation on HumanEval shows that metamorphic prompt testing is able to detect 75 percent of the erroneous programs generated by GPT-4, with a false positive rate of 8.6 percent.
arXiv Detail & Related papers (2024-06-11T00:40:17Z) - Prompt-based Code Completion via Multi-Retrieval Augmented Generation [15.233727939816388]
ProCC is a code completion framework leveraging prompt engineering and the contextual multi-armed bandits algorithm.
ProCC outperforms state-of-the-art code completion technique by 8.6% on our collected open-source benchmark suite.
ProCC also allows augmenting fine-tuned techniques in a plug-and-play manner, yielding 5.6% improvement over our studied fine-tuned model.
arXiv Detail & Related papers (2024-05-13T07:56:15Z) - Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach [66.51005288743153]
We investigate the legal and ethical issues of current neural code completion models.
We tailor a membership inference approach (termed CodeMI) that was originally crafted for classification tasks.
We evaluate the effectiveness of this adapted approach across a diverse array of neural code completion models.
arXiv Detail & Related papers (2024-04-22T15:54:53Z) - InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models [56.723509505549536]
InfiBench is the first large-scale freeform question-answering (QA) benchmark for code to our knowledge.
It comprises 234 carefully selected high-quality Stack Overflow questions that span across 15 programming languages.
We conduct a systematic evaluation for over 100 latest code LLMs on InfiBench, leading to a series of novel and insightful findings.
arXiv Detail & Related papers (2024-03-11T02:06:30Z) - CodePori: Large-Scale System for Autonomous Software Development Using Multi-Agent Technology [4.2990995991059275]
Large Language Models (LLMs) and Generative Pre-trained Transformers (GPTs) have transformed the field of Software Engineering.
We introduce CodePori, a novel system designed to automate code generation for large and complex software projects.
Results: CodePori is able to generate running code for large-scale projects, aligned with the typical software development process.
arXiv Detail & Related papers (2024-02-02T13:42:50Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - IRCoCo: Immediate Rewards-Guided Deep Reinforcement Learning for Code
Completion [38.863871578280936]
We propose IRCoCo, a code completion-specific DRL-based fine-tuning framework.
We show that fine-tuning pretrained LMs with IRCoCo leads to significant improvements in the code completion task.
arXiv Detail & Related papers (2024-01-30T00:18:20Z) - RepoCoder: Repository-Level Code Completion Through Iterative Retrieval
and Generation [96.75695811963242]
RepoCoder is a framework to streamline the repository-level code completion process.
It incorporates a similarity-based retriever and a pre-trained code language model.
It consistently outperforms the vanilla retrieval-augmented code completion approach.
arXiv Detail & Related papers (2023-03-22T13:54:46Z) - Don't Complete It! Preventing Unhelpful Code Completion for Productive and Sustainable Neural Code Completion Systems [16.03416381009787]
Currently, large pre-trained language models are widely applied in neural code completion systems.
Around 70% of displayed code completions from Github Copilot are not accepted by developers.
We propose an early-rejection mechanism to turn down low-return prompts by foretelling the code completion qualities.
arXiv Detail & Related papers (2022-09-13T12:43:41Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.