CodeT: Code Generation with Generated Tests
- URL: http://arxiv.org/abs/2207.10397v1
- Date: Thu, 21 Jul 2022 10:18:37 GMT
- Title: CodeT: Code Generation with Generated Tests
- Authors: Bei Chen, Fengji Zhang, Anh Nguyen, Daoguang Zan, Zeqi Lin, Jian-Guang
Lou, Weizhu Chen
- Abstract summary: We explore the use of pre-trained language models to automatically generate test cases.
CodeT executes the code solutions using the generated test cases, and then chooses the best solution.
We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
- Score: 49.622590050797236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a programming problem, pre-trained language models such as Codex have
demonstrated the ability to generate multiple different code solutions via
sampling. However, selecting a correct or best solution from those samples
still remains a challenge. While an easy way to verify the correctness of a
code solution is through executing test cases, producing high-quality test
cases is prohibitively expensive. In this paper, we explore the use of
pre-trained language models to automatically generate test cases, calling our
method CodeT: Code generation with generated Tests. CodeT executes the code
solutions using the generated test cases, and then chooses the best solution
based on a dual execution agreement with both the generated test cases and
other generated solutions. We evaluate CodeT on five different pre-trained
models with both HumanEval and MBPP benchmarks. Extensive experimental results
demonstrate CodeT can achieve significant, consistent, and surprising
improvements over previous methods. For example, CodeT improves the pass@1 on
HumanEval to 65.8%, an increase of absolute 18.8% on the code-davinci-002
model, and an absolute 20+% improvement over previous state-of-the-art results.
Related papers
- Code Agents are State of the Art Software Testers [10.730852617039451]
We investigate the capability of LLM-based Code Agents for formalizing user issues into test cases.
We propose a novel benchmark based on popular GitHub repositories, containing real-world issues, ground-truth patches, and golden tests.
We find that LLMs generally perform surprisingly well at generating relevant test cases with Code Agents designed for code repair.
arXiv Detail & Related papers (2024-06-18T14:54:37Z) - Test-Driven Development for Code Generation [0.850206009406913]
Large Language Models (LLMs) have demonstrated significant capabilities in generating code snippets directly from problem statements.
This paper investigates if and how Test-Driven Development (TDD) can be incorporated into AI-assisted code-generation processes.
arXiv Detail & Related papers (2024-02-21T04:10:12Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation [11.155351560550853]
This paper introduces Multi-Agent Assistant Code Generation (AgentCoder)
AgentCoder is a novel solution comprising a multi-agent framework with specialized agents: the programmer agent, the test designer agent, and the test executor agent.
Our experiments on 9 code generation models and 12 enhancement approaches showcase AgentCoder's superior performance over existing code generation models.
arXiv Detail & Related papers (2023-12-20T13:22:41Z) - Functional Overlap Reranking for Neural Code Generation [6.665515707408405]
We introduce SRank, a novel reranking strategy for selecting the best solutions from code generation.
By quantifying the functional overlap between solution clusters, our approach provides a better ranking strategy for code solutions.
Empirical results show that our method achieves remarkable results on the pass@1 score.
arXiv Detail & Related papers (2023-10-16T22:20:31Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - Coder Reviewer Reranking for Code Generation [56.80381384717]
We propose Coder-Reviewer reranking as a method for sampling diverse programs from a code language model and reranking with model likelihood.
Experimental results show that Coder-Reviewer reranking leads to consistent and significant improvement over reranking with the Coder model only.
Coder-Reviewer reranking is easy to implement by prompting, can generalize to different programming languages, and works well with off-the-shelf hyper parameters.
arXiv Detail & Related papers (2022-11-29T18:56:33Z) - Soft-Labeled Contrastive Pre-training for Function-level Code
Representation [127.71430696347174]
We present textbfSCodeR, a textbfSoft-labeled contrastive pre-training framework with two positive sample construction methods.
Considering the relevance between codes in a large-scale code corpus, the soft-labeled contrastive pre-training can obtain fine-grained soft-labels.
SCodeR achieves new state-of-the-art performance on four code-related tasks over seven datasets.
arXiv Detail & Related papers (2022-10-18T05:17:37Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - Unit Test Case Generation with Transformers and Focal Context [10.220204860586582]
AthenaTest aims to generate unit test cases by learning from real-world focal methods and developer-written test cases.
We introduce Methods2Test, the largest publicly available supervised parallel corpus of unit test case methods and corresponding focal methods in Java.
We evaluate AthenaTest on five defects4j projects, generating 25K passing test cases covering 43.7% of the focal methods with only 30 attempts.
arXiv Detail & Related papers (2020-09-11T18:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.