Related papers: On Rank Aggregating Test Prioritizations

On Rank Aggregating Test Prioritizations

URL: http://arxiv.org/abs/2412.00015v1
Date: Fri, 15 Nov 2024 11:17:37 GMT
Title: On Rank Aggregating Test Prioritizations
Authors: Shouvick Mondal, Tse-Hsun Chen,
Abstract summary: Test case prioritization ( TCP) has been an effective strategy to optimize regression testing.<n>We propose Ensemble Test Prioritization (EnTP) as a three stage pipeline involving: (i) ensemble selection, (ii) rank aggregation, and (iii) test case execution.
Score: 1.7802147489386628
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Test case prioritization (TCP) has been an effective strategy to optimize regression testing. Traditionally, test cases are ordered based on some heuristic and rerun against the version under test with the goal of yielding a high failure throughput. Almost four decades of TCP research has seen extensive contributions in the light of individual prioritization strategies. However, test case prioritization via preference aggregation has largely been unexplored. We envision this methodology as an opportunity to obtain robust prioritizations by consolidating multiple standalone ranked lists, i.e., performing a consensus. In this work, we propose Ensemble Test Prioritization (EnTP) as a three stage pipeline involving: (i) ensemble selection, (ii) rank aggregation, and (iii) test case execution. We evaluate EnTP on 20 open-source C projects from the Software-artifact Infrastructure Repository and GitHub (totaling: 694,512 SLOC, 280 versions, and 69,305 system level test-cases). We employ an ensemble of 16 standalone prioritization plans, four of which are imposed due to respective state-of-the-art approaches. We build EnTP on the foundations of Hansie, an existing framework on consensus prioritization and show that EnTP's diversity based ensemble selection budget of top-75% followed by rank aggregation can outperform Hansie, and the employed standalone prioritization approaches.

Related papers

Surprisal-Guided Selection: Compute-Optimal Test-Time Strategies for Execution-Grounded Code Generation [0.0]
We study compute-optimal test-time strategies for verifiable execution-grounded (VEG) tasks.<n>For dense-reward VEG tasks, compute should be allocated to sample diversity and intelligent selection rather than gradient adaptation.
arXiv Detail & Related papers (2026-02-07T19:29:07Z)
On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference [71.09125259964684]
Test-time compute (TTC) has become an increasingly prominent paradigm for enhancing large language models (LLMs)<n>We study reward-filtered sequential inference, a simple procedure that selectively incorporates only high-reward generations into the context.<n>On the theoretical side, we show that reward-filtered sequential inference yields strictly stronger guarantees than standard TTC paradigms.
arXiv Detail & Related papers (2025-12-04T08:21:33Z)
carps: A Framework for Comparing N Hyperparameter Optimizers on M Benchmarks [61.79411281702448]
carps is a benchmark framework for Comprehensive Automated Research Performance Studies.<n>We focus on the four most important types of HPO task types: blackbox, multi-objective, multi-fidelity-multi-objective.<n>With 3 336 tasks from 5 community benchmark collections and 28 variants of 9 families, we offer the biggest go-to library to date.
arXiv Detail & Related papers (2025-06-06T15:01:39Z)
Reinforcing Compositional Retrieval: Retrieving Step-by-Step for Composing Informative Contexts [67.67746334493302]
Large Language Models (LLMs) have demonstrated remarkable capabilities across numerous tasks, yet they often rely on external context to handle complex tasks. We propose a tri-encoder sequential retriever that models this process as a Markov Decision Process (MDP) We show that our method consistently and significantly outperforms baselines, underscoring the importance of explicitly modeling inter-example dependencies.
arXiv Detail & Related papers (2025-04-15T17:35:56Z)
CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing [70.25689961697523]
We propose a generalizable algorithm that enhances sequential reasoning by cross-task experience sharing and selection. Our work bridges the gap between existing sequential reasoning paradigms and validates the effectiveness of leveraging cross-task experiences.
arXiv Detail & Related papers (2024-10-22T03:59:53Z)
Segment-Based Test Case Prioritization: A Multi-objective Approach [8.972346309150199]
Test case prioritization ( TCP) is a cost-efficient solution to schedule test cases in an execution order that maximizes an objective function. We introduce a multi-objective optimization approach to prioritize UI test cases using evolutionary search algorithms and four coverage criteria. Our approach significantly outperforms other methods in terms of Average Percentage of Faults Detected (APFD) and APFD with Cost.
arXiv Detail & Related papers (2024-08-01T16:51:01Z)
UniTTA: Unified Benchmark and Versatile Framework Towards Realistic Test-Time Adaptation [66.05528698010697]
Test-Time Adaptation aims to adapt pre-trained models to the target domain during testing. Researchers have identified various challenging scenarios and developed diverse methods to address these challenges. We propose a Unified Test-Time Adaptation benchmark, which is comprehensive and widely applicable.
arXiv Detail & Related papers (2024-07-29T15:04:53Z)
Feature-oriented Test Case Selection and Prioritization During the Evolution of Highly-Configurable Systems [1.5225153671736202]
We introduce FeaTestSelPrio, a feature-oriented test case selection and prioritization approach for HCSs. Our approach selects a greater number of tests and takes longer to execute than a changed-file-oriented approach, used as baseline. The prioritization step allows reducing the average test budget in 86% of the failed commits.
arXiv Detail & Related papers (2024-06-21T16:39:10Z)
Towards Explainable Test Case Prioritisation with Learning-to-Rank Models [6.289767078502329]
Test case prioritisation ( TCP) is a critical task in regression testing to ensure quality as software evolves. We present and discuss scenarios that require different explanations and how the particularities of TCP could influence them.
arXiv Detail & Related papers (2024-05-22T16:11:45Z)
Constrained C-Test Generation via Mixed-Integer Programming [55.28927994487036]
This work proposes a novel method to generate C-Tests; a form of cloze tests (a gap filling exercise) where only the last part of a word is turned into a gap. In contrast to previous works that only consider varying the gap size or gap placement to achieve locally optimal solutions, we propose a mixed-integer programming (MIP) approach. We publish our code, model, and collected data consisting of 32 English C-Tests with 20 gaps each (totaling 3,200 individual gap responses) under an open source license.
arXiv Detail & Related papers (2024-04-12T21:35:21Z)
Assisted Requirements Selection by Clustering [0.0]
It is a complex multi-criteria decision process that has been focused by many research works because a balance between business profits and investment is needed. This work studies the combination of the qualitative MoSCoW method and cluster analysis for requirements selection.
arXiv Detail & Related papers (2024-01-23T10:33:44Z)
Coverage Goal Selector for Combining Multiple Criteria in Search-Based Unit Test Generation [26.121557667962556]
Unit testing is critical to ensuring correctness of programming units in a program. Search-based software testing (SBST) is an automated approach to generating test cases.
arXiv Detail & Related papers (2023-09-14T08:35:03Z)
On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts. We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z)
Sequential Kernelized Independence Testing [101.22966794822084]
We design sequential kernelized independence tests inspired by kernelized dependence measures. We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z)
RnG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering [57.94658176442027]
We present RnG-KBQA, a Rank-and-Generate approach for KBQA. We achieve new state-of-the-art results on GrailQA and WebQSP datasets.
arXiv Detail & Related papers (2021-09-17T17:58:28Z)
Bayesian decision-making under misspecified priors with applications to meta-learning [64.38020203019013]
Thompson sampling and other sequential decision-making algorithms are popular approaches to tackle explore/exploit trade-offs in contextual bandits. We show that performance degrades gracefully with misspecified priors.
arXiv Detail & Related papers (2021-07-03T23:17:26Z)
Test case prioritization using test case diversification and fault-proneness estimations [0.0]
We propose an approach for TCP that takes into account test case coverage data, bug history, and test case diversification. The diversification of test cases is preserved by incorporating fault-proneness on a clustering-based approach scheme. The experiments show that the proposed methods are superior to coverage-based TCP methods.
arXiv Detail & Related papers (2021-06-19T15:55:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.