Learning to Encode and Classify Test Executions
- URL: http://arxiv.org/abs/2001.02444v2
- Date: Mon, 2 Oct 2023 13:48:20 GMT
- Title: Learning to Encode and Classify Test Executions
- Authors: Foivos Tsimpourlas, Ajitha Rajan, Miltiadis Allamanis
- Abstract summary: The goal in this paper is to solve the test oracle problem in a way that is general, scalable and accurate.
We label a small fraction of the execution traces with their verdict of pass or fail.
We use labelled traces to train a neural network (NN) model to learn to distinguish runtime patterns for passing versus failing executions.
- Score: 14.67675979776677
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The challenge of automatically determining the correctness of test executions
is referred to as the test oracle problem and is one of the key remaining
issues for automated testing. The goal in this paper is to solve the test
oracle problem in a way that is general, scalable and accurate. To achieve
this, we use supervised learning over test execution traces. We label a small
fraction of the execution traces with their verdict of pass or fail. We use the
labelled traces to train a neural network (NN) model to learn to distinguish
runtime patterns for passing versus failing executions for a given program. Our
approach for building this NN model involves the following steps, 1. Instrument
the program to record execution traces as sequences of method invocations and
global state, 2. Label a small fraction of the execution traces with their
verdicts, 3. Designing a NN component that embeds information in execution
traces to fixed length vectors, 4. Design a NN model that uses the trace
information for classification, 5. Evaluate the inferred classification model
on unseen execution traces from the program.
We evaluate our approach using case studies from different application
domains: 1. Module from Ethereum Blockchain, 2. Module from PyTorch deep
learning framework, 3. Microsoft SEAL encryption library components, 4. Sed
stream editor, 5. Value pointer library and 6. Nine network protocols from
Linux packet identifier, L7-Filter. We found the classification models for all
subject programs resulted in high precision, recall and specificity, over 95%,
while only training with an average 9% of the total traces. Our experiments
show that the proposed neural network model is highly effective as a test
oracle and is able to learn runtime patterns to distinguish passing and failing
test executions for systems and tests from different application domains.
Related papers
- Learning to Predict Program Execution by Modeling Dynamic Dependency on Code Graphs [11.347234752942684]
This paper introduces a novel machine learning-based framework called CodeFlow to predict code coverage and detect runtime errors.
CodeFlow represents all possible execution paths and the relationships between different statements.
It learns dynamic dependencies through execution traces, which reflect the impacts among statements during execution.
arXiv Detail & Related papers (2024-08-05T20:32:00Z) - TESTEVAL: Benchmarking Large Language Models for Test Case Generation [15.343859279282848]
We propose TESTEVAL, a novel benchmark for test case generation with large language models (LLMs)
We collect 210 Python programs from an online programming platform, LeetCode, and design three different tasks: overall coverage, targeted line/branch coverage, and targeted path coverage.
We find that generating test cases to cover specific program lines/branches/paths is still challenging for current LLMs.
arXiv Detail & Related papers (2024-06-06T22:07:50Z) - Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM [32.44432906540792]
We present SymPrompt, a code-aware prompting strategy for large language models in test generation.
SymPrompt enhances correct test generations by a factor of 5 and bolsters relative coverage by 26% for CodeGen2.
Notably, when applied to GPT-4, SymPrompt improves coverage by over 2x compared to baseline prompting strategies.
arXiv Detail & Related papers (2024-01-31T18:21:49Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z) - Learning from Self-Sampled Correct and Partially-Correct Programs [96.66452896657991]
We propose to let the model perform sampling during training and learn from both self-sampled fully-correct programs and partially-correct programs.
We show that our use of self-sampled correct and partially-correct programs can benefit learning and help guide the sampling process.
Our proposed method improves the pass@k performance by 3.1% to 12.3% compared to learning from a single reference program with MLE.
arXiv Detail & Related papers (2022-05-28T03:31:07Z) - Prompt Consistency for Zero-Shot Task Generalization [118.81196556175797]
In this paper, we explore methods to utilize unlabeled data to improve zero-shot performance.
Specifically, we take advantage of the fact that multiple prompts can be used to specify a single task, and propose to regularize prompt consistency.
Our approach outperforms the state-of-the-art zero-shot learner, T0, on 9 out of 11 datasets across 4 NLP tasks by up to 10.6 absolute points in terms of accuracy.
arXiv Detail & Related papers (2022-04-29T19:18:37Z) - Absolute Wrong Makes Better: Boosting Weakly Supervised Object Detection
via Negative Deterministic Information [54.35679298764169]
Weakly supervised object detection (WSOD) is a challenging task, in which image-level labels are used to train an object detector.
This paper focuses on identifying and fully exploiting the deterministic information in WSOD.
We propose a negative deterministic information (NDI) based method for improving WSOD, namely NDI-WSOD.
arXiv Detail & Related papers (2022-04-21T12:55:27Z) - On the Evaluation of Sequential Machine Learning for Network Intrusion
Detection [3.093890460224435]
We propose a detailed methodology to extract temporal sequences of NetFlows that denote patterns of malicious activities.
We then apply this methodology to compare the efficacy of sequential learning models against traditional static learning models.
arXiv Detail & Related papers (2021-06-15T08:29:28Z) - Active Testing: Sample-Efficient Model Evaluation [39.200332879659456]
We introduce active testing: a new framework for sample-efficient model evaluation.
Active testing addresses this by carefully selecting the test points to label.
We show how to remove that bias while reducing the variance of the estimator.
arXiv Detail & Related papers (2021-03-09T10:20:49Z) - Learning to Execute Programs with Instruction Pointer Attention Graph
Neural Networks [55.98291376393561]
Graph neural networks (GNNs) have emerged as a powerful tool for learning software engineering tasks.
Recurrent neural networks (RNNs) are well-suited to long sequential chains of reasoning, but they do not naturally incorporate program structure.
We introduce a novel GNN architecture, the Instruction Pointer Attention Graph Neural Networks (IPA-GNN), which improves systematic generalization on the task of learning to execute programs.
arXiv Detail & Related papers (2020-10-23T19:12:30Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.