Related papers: Evaluating Search-Based Software Microbenchmark Prioritization

Evaluating Search-Based Software Microbenchmark Prioritization

URL: http://arxiv.org/abs/2211.13525v4
Date: Thu, 18 Apr 2024 13:13:48 GMT
Title: Evaluating Search-Based Software Microbenchmark Prioritization
Authors: Christoph Laaber, Tao Yue, Shaukat Ali,
Abstract summary: This paper empirically evaluate single- and multi-objective search-based microbenchmark prioritization techniques. We find that search algorithms (SAs) are only competitive with but do not outperform the best greedy, coverage-based baselines.
Score: 6.173678645884399
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ensuring that software performance does not degrade after a code change is paramount. A solution is to regularly execute software microbenchmarks, a performance testing technique similar to (functional) unit tests, which, however, often becomes infeasible due to extensive runtimes. To address that challenge, research has investigated regression testing techniques, such as test case prioritization (TCP), which reorder the execution within a microbenchmark suite to detect larger performance changes sooner. Such techniques are either designed for unit tests and perform sub-par on microbenchmarks or require complex performance models, drastically reducing their potential application. In this paper, we empirically evaluate single- and multi-objective search-based microbenchmark prioritization techniques to understand whether they are more effective and efficient than greedy, coverage-based techniques. For this, we devise three search objectives, i.e., coverage to maximize, coverage overlap to minimize, and historical performance change detection to maximize. We find that search algorithms (SAs) are only competitive with but do not outperform the best greedy, coverage-based baselines. However, a simple greedy technique utilizing solely the performance change history (without coverage information) is equally or more effective than the best coverage-based techniques while being considerably more efficient, with a runtime overhead of less than 1%. These results show that simple, non-coverage-based techniques are a better fit for microbenchmarks than complex coverage-based techniques.

Related papers

Optimizing Metamorphic Testing: Prioritizing Relations Through Execution Profile Dissimilarity [2.6749261270690434]
An oracle determines whether the output of a program for executed test cases is correct. For machine learning programs, such an oracle is often unavailable or impractical to apply. Prioritizing MRs enhances fault detection effectiveness and improves testing efficiency.
arXiv Detail & Related papers (2024-11-14T04:14:30Z)
DOCE: Finding the Sweet Spot for Execution-Based Code Generation [69.5305729627198]
We propose a comprehensive framework that includes candidate generation, $n$-best reranking, minimum Bayes risk (MBR) decoding, and self-ging as the core components. Our findings highlight the importance of execution-based methods and the difference gap between execution-based and execution-free methods.
arXiv Detail & Related papers (2024-08-25T07:10:36Z)
Segment-Based Test Case Prioritization: A Multi-objective Approach [8.972346309150199]
Test case prioritization ( TCP) is a cost-efficient solution to schedule test cases in an execution order that maximizes an objective function. We introduce a multi-objective optimization approach to prioritize UI test cases using evolutionary search algorithms and four coverage criteria. Our approach significantly outperforms other methods in terms of Average Percentage of Faults Detected (APFD) and APFD with Cost.
arXiv Detail & Related papers (2024-08-01T16:51:01Z)
Benchmarking Predictive Coding Networks -- Made Simple [48.652114040426625]
We tackle the problems of efficiency and scalability for predictive coding networks (PCNs) in machine learning. We propose a library, called PCX, that focuses on performance and simplicity, and use it to implement a large set of standard benchmarks. We perform extensive tests on such benchmarks using both existing algorithms for PCNs, as well as adaptations of other methods popular in the bio-plausible deep learning community.
arXiv Detail & Related papers (2024-07-01T10:33:44Z)
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification is a key element of machine learning applications. We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines. We conduct a large-scale empirical investigation of UQ and normalization techniques across eleven tasks, identifying the most effective approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z)
On Test Sequence Generation using Multi-Objective Particle Swarm Optimization [0.2999888908665658]
Software testing is an important and essential part of the software development life cycle. In the software industry, testing costs can account for about 35% to 40% of the total cost of a software project.
arXiv Detail & Related papers (2024-04-09T18:35:21Z)
A Comprehensively Improved Hybrid Algorithm for Learning Bayesian Networks: Multiple Compound Memory Erasing [0.0]
This paper presents a new hybrid algorithm, MCME (multiple compound memory erasing) MCME retains the advantages of the first two methods, solves the shortcomings of the above CI tests, and makes innovations in the scoring function in the direction discrimination stage. A large number of experiments show that MCME has better or similar performance than some existing algorithms.
arXiv Detail & Related papers (2022-12-05T12:52:07Z)
Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest. Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree. We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z)
Injecting Domain Adaptation with Learning-to-hash for Effective and Efficient Zero-shot Dense Retrieval [49.98615945702959]
We evaluate LTH and vector compression techniques for improving the downstream zero-shot retrieval accuracy of the TAS-B dense retriever. Our results demonstrate that, unlike prior work, LTH strategies when applied naively can underperform the zero-shot TAS-B dense retriever on average by up to 14% nDCG@10.
arXiv Detail & Related papers (2022-05-23T17:53:44Z)
Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples. We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment. We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z)
Don't Search for a Search Method -- Simple Heuristics Suffice for Adversarial Text Attacks [11.196974000738729]
We implement an algorithm inspired by zeroth order optimization-based attacks and compare with the benchmark results in the TextAttack framework. Surprisingly, we find that optimization-based methods do not yield any improvement in a constrained setup. We conclude from these results that current TextAttack benchmark tasks are too easy and constraints are too strict, preventing meaningful research on black-box adversarial text attacks.
arXiv Detail & Related papers (2021-09-16T12:22:17Z)
Prior Guided Feature Enrichment Network for Few-Shot Segmentation [64.91560451900125]
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results. Few-shot segmentation is proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples. Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information.
arXiv Detail & Related papers (2020-08-04T10:41:32Z)
Greedy Policy Search: A Simple Baseline for Learnable Test-Time Augmentation [65.92151529708036]
We introduce greedy policy search (GPS) as a simple but high-performing method for learning a policy of test-time augmentation. We demonstrate that augmentation policies learned with GPS achieve superior predictive performance on image classification problems.
arXiv Detail & Related papers (2020-02-21T02:57:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.