Observation-based unit test generation at Meta
- URL: http://arxiv.org/abs/2402.06111v1
- Date: Fri, 9 Feb 2024 00:34:39 GMT
- Title: Observation-based unit test generation at Meta
- Authors: Nadia Alshahwan, Mark Harman, Alexandru Marginean, Rotem Tal, Eddy
Wang
- Abstract summary: TestGen automatically generates unit tests, carved from serialized observations of complex objects, observed during app execution.
TestGen has landed 518 tests into production, which have been executed 9,617,349 times in continuous integration, finding 5,702 faults.
Our evaluation reveals that, when carving its observations from 4,361 reliable end-to-end tests, TestGen was able to generate tests for at least 86% of the classes covered by end-to-end tests.
- Score: 52.4716552057909
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: TestGen automatically generates unit tests, carved from serialized
observations of complex objects, observed during app execution. We describe the
development and deployment of TestGen at Meta. In particular, we focus on the
scalability challenges overcome during development in order to deploy
observation-based test carving at scale in industry. So far, TestGen has landed
518 tests into production, which have been executed 9,617,349 times in
continuous integration, finding 5,702 faults. Meta is currently in the process
of more widespread deployment. Our evaluation reveals that, when carving its
observations from 4,361 reliable end-to-end tests, TestGen was able to generate
tests for at least 86\% of the classes covered by end-to-end tests. Testing on
16 Kotlin Instagram app-launch-blocking tasks demonstrated that the TestGen
tests would have trapped 13 of these before they became launch blocking.
Related papers
- TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark [24.14654309612826]
TestGenEval comprises 68,647 tests from 1,210 code and test file pairs across 11 well-maintained Python repositories.
It covers initial tests authoring, test suite completion, and code coverage improvements.
We evaluate several popular models, with sizes ranging from 7B to 405B parameters.
arXiv Detail & Related papers (2024-10-01T14:47:05Z) - Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA [47.29324864511411]
Flaky tests fail seemingly at random without changes to the code.
We study characteristics of tests and the test environment that potentially impact test flakiness.
arXiv Detail & Related papers (2024-09-16T07:52:09Z) - Leveraging Large Language Models for Enhancing the Understandability of Generated Unit Tests [4.574205608859157]
We introduce UTGen, which combines search-based software testing and large language models to enhance the understandability of automatically generated test cases.
We observe that participants working on assignments with UTGen test cases fix up to 33% more bugs and use up to 20% less time when compared to baseline test cases.
arXiv Detail & Related papers (2024-08-21T15:35:34Z) - Automated Unit Test Improvement using Large Language Models at Meta [44.87533111512982]
This paper describes Meta's TestGen-LLM tool, which uses LLMs to automatically improve existing human-written tests.
We describe the deployment of TestGen-LLM at Meta test-a-thons for the Instagram and Facebook platforms.
arXiv Detail & Related papers (2024-02-14T13:43:14Z) - Do Automatic Test Generation Tools Generate Flaky Tests? [12.813573907094074]
The prevalence and nature of flaky tests produced by test generation tools remain largely unknown.
We generate tests using EvoSuite (Java) and Pynguin (Python) and execute each test 200 times.
Our results show that flakiness is at least as common in generated tests as in developer-written tests.
arXiv Detail & Related papers (2023-10-08T16:44:27Z) - Towards Automatic Generation of Amplified Regression Test Oracles [44.45138073080198]
We propose a test oracle derivation approach to amplify regression test oracles.
The approach monitors the object state during test execution and compares it to the previous version to detect any changes in relation to the SUT's intended behaviour.
arXiv Detail & Related papers (2023-07-28T12:38:44Z) - CodeT: Code Generation with Generated Tests [49.622590050797236]
We explore the use of pre-trained language models to automatically generate test cases.
CodeT executes the code solutions using the generated test cases, and then chooses the best solution.
We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
arXiv Detail & Related papers (2022-07-21T10:18:37Z) - Unit Test Case Generation with Transformers and Focal Context [10.220204860586582]
AthenaTest aims to generate unit test cases by learning from real-world focal methods and developer-written test cases.
We introduce Methods2Test, the largest publicly available supervised parallel corpus of unit test case methods and corresponding focal methods in Java.
We evaluate AthenaTest on five defects4j projects, generating 25K passing test cases covering 43.7% of the focal methods with only 30 attempts.
arXiv Detail & Related papers (2020-09-11T18:57:36Z) - Beyond Accuracy: Behavioral Testing of NLP models with CheckList [66.42971817954806]
CheckList is a task-agnostic methodology for testing NLP models.
CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation.
In a user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.
arXiv Detail & Related papers (2020-05-08T15:48:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.