Related papers: Test2Vec: An Execution Trace Embedding for Test Case Prioritization

Test2Vec: An Execution Trace Embedding for Test Case Prioritization

URL: http://arxiv.org/abs/2206.15428v1
Date: Tue, 28 Jun 2022 20:38:36 GMT
Title: Test2Vec: An Execution Trace Embedding for Test Case Prioritization
Authors: Emad Jabbar, Soheila Zangeneh, Hadi Hemmati, Robert Feldt
Abstract summary: Execution traces of test cases can be a good alternative to abstract their behavior for automated testing tasks. We propose a novel embedding approach, Test2Vec, that maps test execution traces to a latent space. Results show that our proposed TP improves best alternatives by 41.80% in terms of the median normalized rank of the first failing test case.
Score: 12.624724734296342
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most automated software testing tasks can benefit from the abstract representation of test cases. Traditionally, this is done by encoding test cases based on their code coverage. Specification-level criteria can replace code coverage to better represent test cases' behavior, but they are often not cost-effective. In this paper, we hypothesize that execution traces of the test cases can be a good alternative to abstract their behavior for automated testing tasks. We propose a novel embedding approach, Test2Vec, that maps test execution traces to a latent space. We evaluate this representation in the test case prioritization (TP) task. Our default TP method is based on the similarity of the embedded vectors to historical failing test vectors. We also study an alternative based on the diversity of test vectors. Finally, we propose a method to decide which TP to choose, for a given test suite. The experiment is based on several real and seeded faults with over a million execution traces. Results show that our proposed TP improves best alternatives by 41.80% in terms of the median normalized rank of the first failing test case (FFR). It outperforms traditional code coverage-based approaches by 25.05% and 59.25% in terms of median APFD and median normalized FFR.

Related papers

Studying the Impact of Early Test Termination Due to Assertion Failure on Code Coverage and Spectrum-based Fault Localization [48.22524837906857]
This study is the first empirical study on early test termination due to assertion failure. We investigated 207 versions of 6 open-source projects. Our findings indicate that early test termination harms both code coverage and the effectiveness of spectrum-based fault localization.
arXiv Detail & Related papers (2025-04-06T17:14:09Z)
TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark [24.14654309612826]
TestGenEval comprises 68,647 tests from 1,210 code and test file pairs across 11 well-maintained Python repositories. It covers initial tests authoring, test suite completion, and code coverage improvements. We evaluate several popular models, with sizes ranging from 7B to 405B parameters.
arXiv Detail & Related papers (2024-10-01T14:47:05Z)
Feature-oriented Test Case Selection and Prioritization During the Evolution of Highly-Configurable Systems [1.5225153671736202]
We introduce FeaTestSelPrio, a feature-oriented test case selection and prioritization approach for HCSs. Our approach selects a greater number of tests and takes longer to execute than a changed-file-oriented approach, used as baseline. The prioritization step allows reducing the average test budget in 86% of the failed commits.
arXiv Detail & Related papers (2024-06-21T16:39:10Z)
Precise Error Rates for Computationally Efficient Testing [75.63895690909241]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity. An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z)
On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts. We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z)
AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation [64.9230895853942]
Domain generalization can be arbitrarily hard without exploiting target domain information. Test-time adaptive (TTA) methods are proposed to address this issue. In this work, we adopt Non-Parametric to perform the test-time Adaptation (AdaNPC)
arXiv Detail & Related papers (2023-04-25T04:23:13Z)
Sequential Kernelized Independence Testing [101.22966794822084]
We design sequential kernelized independence tests inspired by kernelized dependence measures. We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z)
CodeT: Code Generation with Generated Tests [49.622590050797236]
We explore the use of pre-trained language models to automatically generate test cases. CodeT executes the code solutions using the generated test cases, and then chooses the best solution. We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
arXiv Detail & Related papers (2022-07-21T10:18:37Z)
Supervised Learning for Coverage-Directed Test Selection in Simulation-Based Verification [0.0]
We introduce a novel method for automatic constraint extraction and test selection. Coverage-directed test selection is based on supervised learning from coverage feedback. We show how coverage-directed test selection can reduce manual constraint writing, prioritise effective tests, reduce verification resource consumption, and accelerate coverage closure on a large, real-life industrial hardware design.
arXiv Detail & Related papers (2022-05-17T17:49:30Z)
DeepOrder: Deep Learning for Test Case Prioritization in Continuous Integration Testing [6.767885381740952]
This work introduces DeepOrder, a deep learning-based model that works on the basis of regression machine learning. DeepOrder ranks test cases based on the historical record of test executions from any number of previous test cycles. We experimentally show that deep neural networks, as a simple regression model, can be efficiently used for test case prioritization in continuous integration testing.
arXiv Detail & Related papers (2021-10-14T15:10:38Z)
TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks [14.547623982073475]
Deep learning systems are notoriously difficult to test and debug. It is essential to conduct test selection and label only those selected "high quality" bug-revealing test inputs for test cost reduction. We propose a novel test prioritization technique that brings order into the unlabeled test instances according to their bug-revealing capabilities, namely TestRank.
arXiv Detail & Related papers (2021-05-21T03:41:10Z)
Noisy Adaptive Group Testing using Bayesian Sequential Experimental Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually. Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.