Related papers: Can Search-Based Testing with Pareto Optimization Effectively Cover Failure-Revealing Test Inputs?

Can Search-Based Testing with Pareto Optimization Effectively Cover Failure-Revealing Test Inputs?

URL: http://arxiv.org/abs/2410.11769v2
Date: Wed, 16 Oct 2024 08:30:57 GMT
Title: Can Search-Based Testing with Pareto Optimization Effectively Cover Failure-Revealing Test Inputs?
Authors: Lev Sorokin, Damir Safin, Shiva Nejati,
Abstract summary: We argue that search-based software testing (SBST) is inadequate for covering failure-inducing areas within a search domain. We measure the coverage of failure-revealing test inputs in the input space using a metric that we refer to as the Coverage Inverted Distance quality indicator.
Score: 2.038863628148453
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Search-based software testing (SBST) is a widely adopted technique for testing complex systems with large input spaces, such as Deep Learning-enabled (DL-enabled) systems. Many SBST techniques focus on Pareto-based optimization, where multiple objectives are optimized in parallel to reveal failures. However, it is important to ensure that identified failures are spread throughout the entire failure-inducing area of a search domain and not clustered in a sub-region. This ensures that identified failures are semantically diverse and reveal a wide range of underlying causes. In this paper, we present a theoretical argument explaining why testing based on Pareto optimization is inadequate for covering failure-inducing areas within a search domain. We support our argument with empirical results obtained by applying two widely used types of Pareto-based optimization techniques, namely NSGA-II (an evolutionary algorithm) and OMOPSO (a swarm-based Pareto-optimization algorithm), to two DL-enabled systems: an industrial Automated Valet Parking (AVP) system and a system for classifying handwritten digits. We measure the coverage of failure-revealing test inputs in the input space using a metric that we refer to as the Coverage Inverted Distance quality indicator. Our results show that NSGA-II-based search and OMOPSO are not more effective than a na\"ive random search baseline in covering test inputs that reveal failures. The replication package for this study is available in a GitHub repository.

Related papers

Dense Object Detection Based on De-homogenized Queries [12.33849715319161]
Dense object detection is widely used in automatic driving, video surveillance, and other fields. Currently, detection methods based on greedy algorithms, such as non-maximum suppression (NMS), often produce many repetitive predictions or missed detections in dense scenarios. Through the end-to-end DETR (DEtection TRansformer), as a type of detector that can incorporate the post-processing de-duplication capability of NMS, etc., into the network, we found that homogeneous queries in the query-based detector lead to a reduction in the de-duplication capability of the network and the learning efficiency of the encoder
arXiv Detail & Related papers (2025-02-11T02:36:10Z)
Test Generation Strategies for Building Failure Models and Explaining Spurious Failures [4.995172162560306]
Test inputs fail not only when the system under test is faulty but also when the inputs are invalid or unrealistic. We propose to build failure models for inferring interpretable rules on test inputs that cause spurious failures. We show that our proposed surrogate-assisted approach generates failure models with an average accuracy of 83%.
arXiv Detail & Related papers (2023-12-09T18:36:15Z)
Semi-DETR: Semi-Supervised Object Detection with Detection Transformers [105.45018934087076]
We analyze the DETR-based framework on semi-supervised object detection (SSOD) We present Semi-DETR, the first transformer-based end-to-end semi-supervised object detector. Our method outperforms all state-of-the-art methods by clear margins.
arXiv Detail & Related papers (2023-07-16T16:32:14Z)
On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts. We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z)
Towards Efficient and Domain-Agnostic Evasion Attack with High-dimensional Categorical Inputs [33.36532022853583]
Our work targets at searching feasible adversarial to attack a perturbation with high-dimensional categorical inputs in a domain-agnostic setting. Our proposed method, namely FEAT, treats modifying each categorical feature as pulling an arm in multi-armed bandit programming. Our work further hints the applicability of FEAT for assessing the adversarial vulnerability of classification systems with high-dimensional categorical inputs.
arXiv Detail & Related papers (2022-12-13T18:45:00Z)
DI-NIDS: Domain Invariant Network Intrusion Detection System [9.481792073140204]
In various applications, such as computer vision, domain adaptation techniques have been successful. In the case of network intrusion detection however, the state-of-the-art domain adaptation approaches have had limited success. We propose to extract domain invariant features using adversarial domain adaptation from multiple network domains.
arXiv Detail & Related papers (2022-10-15T10:26:22Z)
$\beta$-DARTS: Beta-Decay Regularization for Differentiable Architecture Search [85.84110365657455]
We propose a simple-but-efficient regularization method, termed as Beta-Decay, to regularize the DARTS-based NAS searching process. Experimental results on NAS-Bench-201 show that our proposed method can help to stabilize the searching process and makes the searched network more transferable across different datasets.
arXiv Detail & Related papers (2022-03-03T11:47:14Z)
Regressive Domain Adaptation for Unsupervised Keypoint Detection [67.2950306888855]
Domain adaptation (DA) aims at transferring knowledge from a labeled source domain to an unlabeled target domain. We present a method of regressive domain adaptation (RegDA) for unsupervised keypoint detection. Our method brings large improvement by 8% to 11% in terms of PCK on different datasets.
arXiv Detail & Related papers (2021-03-10T16:45:22Z)
Global Optimization of Objective Functions Represented by ReLU Networks [77.55969359556032]
Neural networks can learn complex, non- adversarial functions, and it is challenging to guarantee their correct behavior in safety-critical contexts. Many approaches exist to find failures in networks (e.g., adversarial examples), but these cannot guarantee the absence of failures. We propose an approach that integrates the optimization process into the verification procedure, achieving better performance than the naive approach.
arXiv Detail & Related papers (2020-10-07T08:19:48Z)
AP-Loss for Accurate One-Stage Object Detection [49.13608882885456]
One-stage object detectors are trained by optimizing classification-loss and localization-loss simultaneously. The former suffers much from extreme foreground-background imbalance due to the large number of anchors. This paper proposes a novel framework to replace the classification task in one-stage detectors with a ranking task.
arXiv Detail & Related papers (2020-08-17T13:22:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.