Learning Deep Semantics for Test Completion
- URL: http://arxiv.org/abs/2302.10166v1
- Date: Mon, 20 Feb 2023 18:53:56 GMT
- Title: Learning Deep Semantics for Test Completion
- Authors: Pengyu Nie, Rahul Banerjee, Junyi Jessy Li, Raymond J. Mooney, Milos
Gligoric
- Abstract summary: We formalize the novel task of test completion to automatically complete the next statement in a test method based on the context of prior statements and the code under test.
We develop TeCo -- a deep learning model using code semantics for test completion.
- Score: 46.842174440120196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Writing tests is a time-consuming yet essential task during software
development. We propose to leverage recent advances in deep learning for text
and code generation to assist developers in writing tests. We formalize the
novel task of test completion to automatically complete the next statement in a
test method based on the context of prior statements and the code under test.
We develop TeCo -- a deep learning model using code semantics for test
completion. The key insight underlying TeCo is that predicting the next
statement in a test method requires reasoning about code execution, which is
hard to do with only syntax-level data that existing code completion models
use. TeCo extracts and uses six kinds of code semantics data, including the
execution result of prior statements and the execution context of the test
method. To provide a testbed for this new task, as well as to evaluate TeCo, we
collect a corpus of 130,934 test methods from 1,270 open-source Java projects.
Our results show that TeCo achieves an exact-match accuracy of 18, which is 29%
higher than the best baseline using syntax-level data only. When measuring
functional correctness of generated next statement, TeCo can generate runnable
code in 29% of the cases compared to 18% obtained by the best baseline.
Moreover, TeCo is significantly better than prior work on test oracle
generation.
Related papers
- TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark [24.14654309612826]
TestGenEval comprises 68,647 tests from 1,210 code and test file pairs across 11 well-maintained Python repositories.
It covers initial tests authoring, test suite completion, and code coverage improvements.
We evaluate several popular models, with sizes ranging from 7B to 405B parameters.
arXiv Detail & Related papers (2024-10-01T14:47:05Z) - Test-Driven Development for Code Generation [0.850206009406913]
Large Language Models (LLMs) have demonstrated significant capabilities in generating code snippets directly from problem statements.
This paper investigates if and how Test-Driven Development (TDD) can be incorporated into AI-assisted code-generation processes.
arXiv Detail & Related papers (2024-02-21T04:10:12Z) - TDD Without Tears: Towards Test Case Generation from Requirements
through Deep Reinforcement Learning [22.331330777536046]
Test-driven development (TDD) mandates writing test cases based on requirements before writing the actual code.
While writing test cases is the centerpiece of TDD, it is time-consuming, expensive, and often shunned by developers.
We introduce PyTester, a Text-to-Testcase generation approach that can automatically generate correct, executable, complete, and effective test cases.
arXiv Detail & Related papers (2024-01-15T10:21:58Z) - CAT-LM: Training Language Models on Aligned Code And Tests [19.526181671936243]
Testing is an integral part of the software development process. Yet, writing tests is time-consuming and therefore often neglected.
We propose the Aligned Code And Tests Language Model (CAT-LM), a GPT-style language model with 2.7 Billion parameters, trained on a corpus of Python and Java projects.
arXiv Detail & Related papers (2023-10-02T19:52:22Z) - CodeExp: Explanatory Code Document Generation [94.43677536210465]
Existing code-to-text generation models produce only high-level summaries of code.
We conduct a human study to identify the criteria for high-quality explanatory docstring for code.
We present a multi-stage fine-tuning strategy and baseline models for the task.
arXiv Detail & Related papers (2022-11-25T18:05:44Z) - CodeT: Code Generation with Generated Tests [49.622590050797236]
We explore the use of pre-trained language models to automatically generate test cases.
CodeT executes the code solutions using the generated test cases, and then chooses the best solution.
We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
arXiv Detail & Related papers (2022-07-21T10:18:37Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - CoSQA: 20,000+ Web Queries for Code Search and Question Answering [63.92224685262063]
CoSQA dataset includes 20,604 labels for pairs of natural language queries and codes.
We introduce a contrastive learning method dubbed CoCLR to enhance query-code matching.
We show that evaluated on CodeXGLUE with the same CodeBERT model, training on CoSQA improves the accuracy of code question answering by 5.1%.
arXiv Detail & Related papers (2021-05-27T15:37:21Z) - Semantic Evaluation for Text-to-SQL with Distilled Test Suites [46.42548219378393]
We propose test suite accuracy to approximate accuracy for Text-to- semantic models.
We use our proposed method to evaluate 21 models submitted to the Spider leader board and manually verify that our method is always correct on 100 examples.
arXiv Detail & Related papers (2020-10-06T16:04:12Z) - Unit Test Case Generation with Transformers and Focal Context [10.220204860586582]
AthenaTest aims to generate unit test cases by learning from real-world focal methods and developer-written test cases.
We introduce Methods2Test, the largest publicly available supervised parallel corpus of unit test case methods and corresponding focal methods in Java.
We evaluate AthenaTest on five defects4j projects, generating 25K passing test cases covering 43.7% of the focal methods with only 30 attempts.
arXiv Detail & Related papers (2020-09-11T18:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.