UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess of Large
Language Models for Program Testing
- URL: http://arxiv.org/abs/2402.03396v1
- Date: Sun, 4 Feb 2024 22:48:05 GMT
- Title: UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess of Large
Language Models for Program Testing
- Authors: Yifeng He, Jiabo Huang, Yuyang Rong, Yiwen Guo, Ethan Wang, Hao Chen
- Abstract summary: We present a large-scale dataset UniTSyn, which is capable of enhancing the prowess of LLMs for Unit Test Synthesis.
By leveraging Language Server Protocol, UniSyn achieves the challenging goal of collecting focal-test pairs without per-project execution setups or per-language setups.
Experiments demonstrate that, by building an autoregressive model based on UniTSyn, we can achieve significant benefits in learning and understanding unit test representations.
- Score: 27.45301385265713
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The remarkable capability of large language models (LLMs) in generating
high-quality code has drawn increasing attention in the software testing
community. However, existing code LLMs often demonstrate unsatisfactory
capabilities in generating accurate and complete tests since they were trained
on code snippets collected without differentiating between code for testing
purposes and other code. In this paper, we present a large-scale dataset
UniTSyn, which is capable of enhancing the prowess of LLMs for Unit Test
Synthesis. Associating tests with the tested functions is crucial for LLMs to
infer the expected behavior and the logic paths to be verified. By leveraging
Language Server Protocol, UniTSyn achieves the challenging goal of collecting
focal-test pairs without per-project execution setups or per-language
heuristics that tend to be fragile and difficult to scale. It contains 2.7
million focal-test pairs across five mainstream programming languages, making
it possible to be utilized for enhancing the test generation ability of LLMs.
The details of UniTSyn can be found in Table 1. Our experiments demonstrate
that, by building an autoregressive model based on UniTSyn, we can achieve
significant benefits in learning and understanding unit test representations,
resulting in improved generation accuracy and code coverage across all
evaluated programming languages. Code and data will be publicly available.
Related papers
- UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance [65.01483640267885]
Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet code generation remains a major challenge.
We introduce UnitCoder, a systematic pipeline leveraging model-generated unit tests to guide and validate the code generation process.
Our work presents a scalable approach that leverages model-generated unit tests to guide the synthesis of high-quality code data from pre-training corpora.
arXiv Detail & Related papers (2025-02-17T05:37:02Z) - A Systematic Approach for Assessing Large Language Models' Test Case Generation Capability [0.8287206589886879]
We propose the Generated Benchmark from Control-Flow Structure and Variable Usage Composition (GBCV) approach to evaluate large language models (LLMs)
By leveraging basic control-flow structures and variable usage, GBCV provides a flexible framework to create a spectrum of programs ranging from simple to complex.
Our findings indicate that GPT-4o performs better on complex program structures, while all models effectively detect boundary values in simple conditions but face challenges with arithmetic computations.
arXiv Detail & Related papers (2025-02-05T03:51:44Z) - Improving the Readability of Automatically Generated Tests using Large Language Models [7.7149881834358345]
We propose to combine the effectiveness of search-based generators with the readability of LLM generated tests.
Our approach focuses on improving test and variable names produced by search-based tools, while keeping their semantics unchanged.
arXiv Detail & Related papers (2024-12-25T09:08:53Z) - OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [70.72097493954067]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems.
While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited.
We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z) - Large Language Models for cross-language code clone detection [3.5202378300682162]
Cross-lingual code clone detection has gained traction within the software engineering community.
Inspired by the significant advances in machine learning, this paper revisits cross-lingual code clone detection.
We evaluate the performance of five (05) Large Language Models (LLMs) and eight prompts (08) for the identification of cross-lingual code clones.
arXiv Detail & Related papers (2024-08-08T12:57:14Z) - Case2Code: Scalable Synthetic Data for Code Generation [105.89741089673575]
Large Language Models (LLMs) have shown outstanding breakthroughs in code generation.
Recent work improves code LLMs by training on synthetic data generated by some powerful LLMs.
We propose a textbfCase2Code task by exploiting the expressiveness and correctness of programs.
arXiv Detail & Related papers (2024-07-17T11:35:00Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Large Language Models as Test Case Generators: Performance Evaluation and Enhancement [3.5398126682962587]
We study how well Large Language Models can generate high-quality test cases.
We propose a multi-agent framework called emphTestChain that decouples the generation of test inputs and test outputs.
Our results indicate that TestChain outperforms the baseline by a large margin.
arXiv Detail & Related papers (2024-04-20T10:27:01Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - The Program Testing Ability of Large Language Models for Code [27.590499335039972]
Large language models (LLMs) for code like CodeX and CodeT5+ demonstrate tremendous promise in achieving code intelligence.
We show a series of intriguing properties of these models and demonstrate how program testing ability of LLMs can be improved.
arXiv Detail & Related papers (2023-10-09T13:55:45Z) - LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results.
Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results.
LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.