Coverage Goal Selector for Combining Multiple Criteria in Search-Based
Unit Test Generation
- URL: http://arxiv.org/abs/2309.07518v2
- Date: Thu, 4 Jan 2024 11:58:13 GMT
- Title: Coverage Goal Selector for Combining Multiple Criteria in Search-Based
Unit Test Generation
- Authors: Zhichao Zhou, Yuming Zhou, Chunrong Fang, Zhenyu Chen, Xiapu Luo,
Jingzhu He, and Yutian Tang
- Abstract summary: Unit testing is critical to ensuring correctness of programming units in a program.
Search-based software testing (SBST) is an automated approach to generating test cases.
- Score: 26.121557667962556
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unit testing is critical to the software development process, ensuring the
correctness of basic programming units in a program (e.g., a method).
Search-based software testing (SBST) is an automated approach to generating
test cases. SBST generates test cases with genetic algorithms by specifying the
coverage criterion (e.g., branch coverage). However, a good test suite must
have different properties, which cannot be captured using an individual
coverage criterion. Therefore, the state-of-the-art approach combines multiple
criteria to generate test cases. Since combining multiple coverage criteria
brings multiple objectives for optimization, it hurts the test suites' coverage
for certain criteria compared with using the single criterion. To cope with
this problem, we propose a novel approach named \textbf{smart selection}. Based
on the coverage correlations among criteria and the subsumption relationships
among coverage goals, smart selection selects a subset of coverage goals to
reduce the number of optimization objectives and avoid missing any properties
of all criteria. We conduct experiments to evaluate smart selection on $400$
Java classes with three state-of-the-art genetic algorithms under the
$2$-minute budget. On average, smart selection outperforms combining all goals
on $65.1\%$ of the classes having significant differences between the two
approaches. Secondly, we conduct experiments to verify our assumptions about
coverage criteria relationships. Furthermore, we assess the coverage
performance of smart selection under varying budgets of $5$, $8$, and $10$
minutes and explore its effect on bug detection, confirming the advantage of
smart selection over combining all goals.
Related papers
- B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible Tests [16.19318541132026]
We show that within a Bayesian framework, the optimal selection strategy can be defined based on the posterior probability of the observed passing states between solutions and tests.
We propose an efficient approach for approximating this optimal (yet uncomputable) strategy, where the approximation error is bounded by the correctness of prior knowledge.
arXiv Detail & Related papers (2024-09-13T10:22:08Z) - Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization [52.80408805368928]
We introduce a novel greedy-style subset selection algorithm for batch acquisition.
Our experiments on the red fluorescent proteins show that our proposed method achieves the baseline performance in 1.69x fewer queries.
arXiv Detail & Related papers (2024-06-21T05:57:08Z) - Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications.
The quality of these exemplars in the prompt greatly impacts performance.
Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z) - Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM [32.44432906540792]
We present SymPrompt, a code-aware prompting strategy for large language models in test generation.
SymPrompt enhances correct test generations by a factor of 5 and bolsters relative coverage by 26% for CodeGen2.
Notably, when applied to GPT-4, SymPrompt improves coverage by over 2x compared to baseline prompting strategies.
arXiv Detail & Related papers (2024-01-31T18:21:49Z) - Class-Conditional Conformal Prediction with Many Classes [60.8189977620604]
We propose a method called clustered conformal prediction that clusters together classes having "similar" conformal scores.
We find that clustered conformal typically outperforms existing methods in terms of class-conditional coverage and set size metrics.
arXiv Detail & Related papers (2023-06-15T17:59:02Z) - Cost-Effective Online Contextual Model Selection [14.094350329970537]
We formulate this task as an online contextual active model selection problem, where at each round the learner receives an unlabeled data point along with a context.
The goal is to output the best model for any given context without obtaining an excessive amount of labels.
We propose a contextual active model selection algorithm (CAMS), which relies on a novel uncertainty sampling query criterion defined on a given policy class for adaptive model selection.
arXiv Detail & Related papers (2022-07-13T08:22:22Z) - Pareto Optimization for Active Learning under Out-of-Distribution Data
Scenarios [79.02009938011447]
We propose a sampling scheme, which selects optimal subsets of unlabeled samples with fixed batch size from the unlabeled data pool.
Experimental results show its effectiveness on both classical Machine Learning (ML) and Deep Learning (DL) tasks.
arXiv Detail & Related papers (2022-07-04T04:11:44Z) - Ensemble pruning via an integer programming approach with diversity
constraints [0.0]
In this paper, we consider a binary classification problem and propose an integer programming (IP) approach for selecting optimal subsets.
We also propose constraints to ensure minimum diversity levels in the ensemble.
Our approach yields competitive results when compared to some of the best and most used pruning methods in literature.
arXiv Detail & Related papers (2022-05-02T17:59:11Z) - How to Query An Oracle? Efficient Strategies to Label Data [59.89900843097016]
We consider the basic problem of querying an expert oracle for labeling a dataset in machine learning.
We present a randomized batch algorithm that operates on a round-by-round basis to label the samples and achieves a query rate of $O(fracNk2)$.
In addition, we present an adaptive greedy query scheme, which achieves an average rate of $approx 0.2N$ queries per sample with triplet queries.
arXiv Detail & Related papers (2021-10-05T20:15:35Z) - Feature Selection Methods for Cost-Constrained Classification in Random
Forests [3.4806267677524896]
Cost-sensitive feature selection describes a feature selection problem, where features raise individual costs for inclusion in a model.
Random Forests define a particularly challenging problem for feature selection, as features are generally entangled in an ensemble of multiple trees.
We propose Shallow Tree Selection, a novel fast and multivariate feature selection method that selects features from small tree structures.
arXiv Detail & Related papers (2020-08-14T11:39:52Z) - Extreme Algorithm Selection With Dyadic Feature Representation [78.13985819417974]
We propose the setting of extreme algorithm selection (XAS) where we consider fixed sets of thousands of candidate algorithms.
We assess the applicability of state-of-the-art AS techniques to the XAS setting and propose approaches leveraging a dyadic feature representation.
arXiv Detail & Related papers (2020-01-29T09:40:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.