CodeT: Code Generation with Generated Tests
- URL: http://arxiv.org/abs/2207.10397v1
- Date: Thu, 21 Jul 2022 10:18:37 GMT
- Title: CodeT: Code Generation with Generated Tests
- Authors: Bei Chen, Fengji Zhang, Anh Nguyen, Daoguang Zan, Zeqi Lin, Jian-Guang
Lou, Weizhu Chen
- Abstract summary: We explore the use of pre-trained language models to automatically generate test cases.
CodeT executes the code solutions using the generated test cases, and then chooses the best solution.
We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
- Score: 49.622590050797236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a programming problem, pre-trained language models such as Codex have
demonstrated the ability to generate multiple different code solutions via
sampling. However, selecting a correct or best solution from those samples
still remains a challenge. While an easy way to verify the correctness of a
code solution is through executing test cases, producing high-quality test
cases is prohibitively expensive. In this paper, we explore the use of
pre-trained language models to automatically generate test cases, calling our
method CodeT: Code generation with generated Tests. CodeT executes the code
solutions using the generated test cases, and then chooses the best solution
based on a dual execution agreement with both the generated test cases and
other generated solutions. We evaluate CodeT on five different pre-trained
models with both HumanEval and MBPP benchmarks. Extensive experimental results
demonstrate CodeT can achieve significant, consistent, and surprising
improvements over previous methods. For example, CodeT improves the pass@1 on
HumanEval to 65.8%, an increase of absolute 18.8% on the code-davinci-002
model, and an absolute 20+% improvement over previous state-of-the-art results.
Related papers
- CodeDPO: Aligning Code Models with Self Generated and Verified Source Code [52.70310361822519]
We propose CodeDPO, a framework that integrates preference learning into code generation to improve two key code preference factors: code correctness and efficiency.
CodeDPO employs a novel dataset construction method, utilizing a self-generation-and-validation mechanism that simultaneously generates and evaluates code and test cases.
arXiv Detail & Related papers (2024-10-08T01:36:15Z) - TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark [24.14654309612826]
TestGenEval comprises 68,647 tests from 1,210 code and test file pairs across 11 well-maintained Python repositories.
It covers initial tests authoring, test suite completion, and code coverage improvements.
We evaluate several popular models, with sizes ranging from 7B to 405B parameters.
arXiv Detail & Related papers (2024-10-01T14:47:05Z) - AutoTest: Evolutionary Code Solution Selection with Test Cases [1.4582633500696451]
This study proposes AutoTest, a novel technique that combines automated test case generation with code solution execution.
The HumanEval dataset consists of 164 programming problems, and AutoTest achieves approximately a 10% improvement over the baseline method in terms of pass@1 score.
arXiv Detail & Related papers (2024-08-22T04:38:41Z) - CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks [35.68087697258125]
We present CodeBenchGen, a framework to create scalable execution-based benchmarks from naturally occurring code sources.
Specifically, we leverage a large language model (LLM) to sandbox arbitrary pieces of code into evaluation examples.
To demonstrate the solvability of examples in Exec-CSN, we present a human study demonstrating that 81.3% of the examples can be solved by humans.
arXiv Detail & Related papers (2024-03-31T05:20:53Z) - Test-Driven Development for Code Generation [0.850206009406913]
Large Language Models (LLMs) have demonstrated significant capabilities in generating code snippets directly from problem statements.
This paper investigates if and how Test-Driven Development (TDD) can be incorporated into AI-assisted code-generation processes.
arXiv Detail & Related papers (2024-02-21T04:10:12Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - Coder Reviewer Reranking for Code Generation [56.80381384717]
We propose Coder-Reviewer reranking as a method for sampling diverse programs from a code language model and reranking with model likelihood.
Experimental results show that Coder-Reviewer reranking leads to consistent and significant improvement over reranking with the Coder model only.
Coder-Reviewer reranking is easy to implement by prompting, can generalize to different programming languages, and works well with off-the-shelf hyper parameters.
arXiv Detail & Related papers (2022-11-29T18:56:33Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - Unit Test Case Generation with Transformers and Focal Context [10.220204860586582]
AthenaTest aims to generate unit test cases by learning from real-world focal methods and developer-written test cases.
We introduce Methods2Test, the largest publicly available supervised parallel corpus of unit test case methods and corresponding focal methods in Java.
We evaluate AthenaTest on five defects4j projects, generating 25K passing test cases covering 43.7% of the focal methods with only 30 attempts.
arXiv Detail & Related papers (2020-09-11T18:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.