Related papers: Automating Quantum Software Maintenance: Flakiness Detection and Root Cause Analysis

Automating Quantum Software Maintenance: Flakiness Detection and Root Cause Analysis

URL: http://arxiv.org/abs/2410.23578v1
Date: Thu, 31 Oct 2024 02:43:04 GMT
Title: Automating Quantum Software Maintenance: Flakiness Detection and Root Cause Analysis
Authors: Janakan Sivaloganathan, Ainaz Jamshidi, Andriy Miranskyy, Lei Zhang,
Abstract summary: Flaky tests, which pass or fail inconsistently without code changes, are a major challenge in software engineering. We aim to create an automated framework to detect flaky tests in quantum software.
Score: 4.554856650068748
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Flaky tests, which pass or fail inconsistently without code changes, are a major challenge in software engineering in general and in quantum software engineering in particular due to their complexity and probabilistic nature, leading to hidden issues and wasted developer effort. We aim to create an automated framework to detect flaky tests in quantum software and an extended dataset of quantum flaky tests, overcoming the limitations of manual methods. Building on prior manual analysis of 14 quantum software repositories, we expanded the dataset and automated flaky test detection using transformers and cosine similarity. We conducted experiments with Large Language Models (LLMs) from the OpenAI GPT and Meta LLaMA families to assess their ability to detect and classify flaky tests from code and issue descriptions. Embedding transformers proved effective: we identified 25 new flaky tests, expanding the dataset by 54%. Top LLMs achieved an F1-score of 0.8871 for flakiness detection but only 0.5839 for root cause identification. We introduced an automated flaky test detection framework using machine learning, showing promising results but highlighting the need for improved root cause detection and classification in large quantum codebases. Future work will focus on improving detection techniques and developing automatic flaky test fixes.

Related papers

Systemic Flakiness: An Empirical Analysis of Co-Occurring Flaky Test Failures [6.824747267214373]
Flaky tests produce inconsistent outcomes without code changes. Developers spend 1.28% of their time repairing flaky tests at a monthly cost of $2,250. We show that flaky tests often exist in clusters, with co-occurring failures that share the same root causes, which we call systemic flakiness.
arXiv Detail & Related papers (2025-04-23T14:51:23Z)
Identifying Flaky Tests in Quantum Code: A Machine Learning Approach [5.323578182914324]
Indeterminacy, a fundamental characteristic of quantum systems, increases the likelihood of flaky tests in quantum programs. We present a novel machine learning platform that leverages multiple machine learning models to automatically detect flaky tests in quantum programs.
arXiv Detail & Related papers (2025-02-06T19:43:51Z)
What You See Is What You Get: Attention-based Self-guided Automatic Unit Test Generation [3.8244417073114003]
We propose Attention-based Self-guided Automatic Unit Test GenERation (AUGER) approach. AUGER contains two stages: defect detection and error triggering. It makes great improvements by 4.7% to 35.3% in terms of F1-score and Precision in defect detection. It can trigger 23 to 84 more errors than state-of-the-art (SOTA) approaches in unit test generation.
arXiv Detail & Related papers (2024-12-01T14:28:48Z)
Validation tests of Gaussian boson samplers with photon-number resolving detectors [44.99833362998488]
We apply phase-space simulation methods to partially verify recent experiments on Gaussian boson sampling (GBS) implementing photon-number resolving (PNR) detectors. We show that the data as a whole shows discrepancies with theoretical predictions for perfect squeezing. We suggest that such validation tests could form the basis of feedback methods to improve GBS quantum computer experiments.
arXiv Detail & Related papers (2024-11-18T01:41:22Z)
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs. Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z)
FuzzTheREST: An Intelligent Automated Black-box RESTful API Fuzzer [0.0]
This work introduces a black-box API of fuzzy testing tool that employs Reinforcement Learning (RL) for vulnerability detection. The tool found a total of six unique vulnerabilities and achieved 55% code coverage.
arXiv Detail & Related papers (2024-07-19T14:43:35Z)
Leveraging Large Language Models for Efficient Failure Analysis in Game Development [47.618236610219554]
This paper proposes a new approach to automatically identify which change in the code caused a test to fail. The method leverages Large Language Models (LLMs) to associate error messages with the corresponding code changes causing the failure. Our approach reaches an accuracy of 71% in our newly created dataset, which comprises issues reported by developers at EA over a period of one year.
arXiv Detail & Related papers (2024-06-11T09:21:50Z)
Quantum Patch-Based Autoencoder for Anomaly Segmentation [44.99833362998488]
We introduce a patch-based quantum autoencoder (QPB-AE) for image anomaly segmentation. QPB-AE reconstructs the quantum state of the embedded input patches, computing an anomaly map directly from measurement. We evaluate its performance across multiple datasets and parameter configurations.
arXiv Detail & Related papers (2024-04-26T08:42:58Z)
FlaKat: A Machine Learning-Based Categorization Framework for Flaky Tests [3.0846824529023382]
Flaky tests can pass or fail non-deterministically, without alterations to a software system. State-of-the-art research incorporates machine learning solutions into flaky test detection and achieves reasonably good accuracy.
arXiv Detail & Related papers (2024-03-01T22:00:44Z)
Towards Automatic Generation of Amplified Regression Test Oracles [44.45138073080198]
We propose a test oracle derivation approach to amplify regression test oracles. The approach monitors the object state during test execution and compares it to the previous version to detect any changes in relation to the SUT's intended behaviour.
arXiv Detail & Related papers (2023-07-28T12:38:44Z)
Identifying Flakiness in Quantum Programs [5.592360872268223]
We find flaky tests in 12 out of 14 quantum software repositories. We identify 46 distinct flaky test reports with 8 groups of causes and 7 common solutions. This work may interest practitioners, as it provides useful insight into the resolution of flaky tests in quantum programs.
arXiv Detail & Related papers (2023-02-07T04:55:34Z)
Validation tests of GBS quantum computers give evidence for quantum advantage with a decoherent target [62.997667081978825]
We use positive-P phase-space simulations of grouped count probabilities as a fingerprint for verifying multi-mode data. We show how one can disprove faked data, and apply this to a classical count algorithm.
arXiv Detail & Related papers (2022-11-07T12:00:45Z)
Experimental benchmarking of an automated deterministic error suppression workflow for quantum algorithms [0.0]
Excitement about the promise of quantum computers is tempered by the reality that the hardware remains exceptionally fragile and error-prone. We describe and experimentally test a fully autonomous workflow designed to deterministically suppress errors in quantum algorithms from the gate level through to circuit execution and measurement.
arXiv Detail & Related papers (2022-09-14T18:23:17Z)
SUPERNOVA: Automating Test Selection and Defect Prevention in AAA Video Games Using Risk Based Testing and Machine Learning [62.997667081978825]
Testing video games is an increasingly difficult task as traditional methods fail to scale with growing software systems. We present SUPERNOVA, a system responsible for test selection and defect prevention while also functioning as an automation hub. The direct impact of this has been observed to be a reduction in 55% or more testing hours for an undisclosed sports game title.
arXiv Detail & Related papers (2022-03-10T00:47:46Z)
DAE : Discriminatory Auto-Encoder for multivariate time-series anomaly detection in air transportation [68.8204255655161]
We propose a novel anomaly detection model called Discriminatory Auto-Encoder (DAE) It uses the baseline of a regular LSTM-based auto-encoder but with several decoders, each getting data of a specific flight phase. Results show that the DAE achieves better results in both accuracy and speed of detection.
arXiv Detail & Related papers (2021-09-08T14:07:55Z)
What is the Vocabulary of Flaky Tests? An Extended Replication [0.0]
We conduct an empirical study to assess the use of code identifiers to predict test flakiness. We validated the performance of trained models using datasets with other flaky tests and from different projects.
arXiv Detail & Related papers (2021-03-23T16:42:22Z)
Anomaly Detection Based on Selection and Weighting in Latent Space [73.01328671569759]
We propose a novel selection-and-weighting-based anomaly detection framework called SWAD. Experiments on both benchmark and real-world datasets have shown the effectiveness and superiority of SWAD.
arXiv Detail & Related papers (2021-03-08T10:56:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.