Bug Classification in Quantum Software: A Rule-Based Framework and Its Evaluation
- URL: http://arxiv.org/abs/2506.10397v1
- Date: Thu, 12 Jun 2025 06:42:10 GMT
- Title: Bug Classification in Quantum Software: A Rule-Based Framework and Its Evaluation
- Authors: Mir Mohammad Yousuf, Shabir Ahmad Sofi,
- Abstract summary: This paper presents an automated framework for classifying issues in quantum software repositories by bug type, category, severity, and impacted quality attributes.<n>The framework achieves up to 85.21% accuracy, with F1-scores ranging from 0.7075 (severity) to 0.8393 (quality attribute)<n>A review of 1,550 quantum-specific bugs showed that over half involved quantum circuit-level problems, followed by gate errors and hardware-related issues.
- Score: 1.1510009152620668
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate classification of software bugs is essential for improving software quality. This paper presents a rule-based automated framework for classifying issues in quantum software repositories by bug type, category, severity, and impacted quality attributes, with additional focus on quantum-specific bug types. The framework applies keyword and heuristic-based techniques tailored to quantum computing. To assess its reliability, we manually classified a stratified sample of 4,984 issues from a dataset of 12,910 issues across 36 Qiskit repositories. Automated classifications were compared with ground truth using accuracy, precision, recall, and F1-score. The framework achieved up to 85.21% accuracy, with F1-scores ranging from 0.7075 (severity) to 0.8393 (quality attribute). Statistical validation via paired t-tests and Cohen's Kappa showed substantial to almost perfect agreement for bug type (k = 0.696), category (k = 0.826), quality attribute (k = 0.818), and quantum-specific bug type (k = 0.712). Severity classification showed slight agreement (k = 0.162), suggesting room for improvement. Large-scale analysis revealed that classical bugs dominate (67.2%), with quantum-specific bugs at 27.3%. Frequent bug categories included compatibility, functional, and quantum-specific defects, while usability, maintainability, and interoperability were the most impacted quality attributes. Most issues (93.7%) were low severity; only 4.3% were critical. A detailed review of 1,550 quantum-specific bugs showed that over half involved quantum circuit-level problems, followed by gate errors and hardware-related issues.
Related papers
- PanCanBench: A Comprehensive Benchmark for Evaluating Large Language Models in Pancreatic Oncology [48.732366302949515]
Large language models (LLMs) have achieved expert-level performance on standardized examinations, yet multiple-choice accuracy poorly reflects real-world clinical utility and safety.<n>We developed a human-in-the-loop pipeline to create expert rubrics for de-identified patient questions.<n>We evaluated 22 proprietary and open-source LLMs using an LLM-as-a-judge framework, measuring clinical completeness, factual accuracy, and web-search integration.
arXiv Detail & Related papers (2026-03-02T00:50:39Z) - Characterizing Bugs and Quality Attributes in Quantum Software: A Large-Scale Empirical Study [0.6445605125467574]
This study presents the first ecosystem-scale longitudinal analysis of software bugs across 123 open source quantum repositories from 2012 to 2024.<n>Full-stack libraries and compilers are the most bug-prone categories due to circuit, gate, and transpilation-related issues.<n>High-severity bugs cluster in cryptography, experimental computing, and compiler toolchains.
arXiv Detail & Related papers (2025-12-31T06:05:49Z) - BugPilot: Complex Bug Generation for Efficient Learning of SWE Skills [59.003563837981886]
High quality bugs are key to training the next generation of language model based software engineering (SWE) agents.<n>We introduce a novel method for synthetic generation of difficult and diverse bugs.
arXiv Detail & Related papers (2025-10-22T17:58:56Z) - Improving IR-based Bug Localization with Semantics-Driven Query Reduction [0.9298382208776371]
We propose IQLoc, a novel approach to localize software bugs against bug reports.<n>We leverage the program semantics understanding of transformer-based models to reason about the suspiciousness of code.<n> IQLoc improves MAP by 91.67% for bug reports with stack traces, 72.73% for those that include code elements, and 65.38% for those containing only descriptions in natural language.
arXiv Detail & Related papers (2025-10-06T03:43:38Z) - TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them [58.04324690859212]
Large Language Models (LLMs) as automated evaluators (LLM-as-a-judge) has revealed critical inconsistencies in current evaluation frameworks.<n>We identify two fundamental types of inconsistencies: Score-Comparison Inconsistency and Pairwise Transitivity Inconsistency.<n>We propose TrustJudge, a probabilistic framework that addresses these limitations through two key innovations.
arXiv Detail & Related papers (2025-09-25T13:04:29Z) - Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking [56.27361644734853]
Knowledge Graph Question Answering systems rely on high-quality benchmarks to evaluate complex multi-hop reasoning.<n>Despite their widespread use, popular datasets such as WebQSP and CWQ suffer from critical quality issues.<n>We introduce KGQAGen, an LLM-in-the-loop framework that systematically resolves these pitfalls.<n>Our findings advocate for more rigorous benchmark construction and position KGQAGen as a scalable framework for advancing KGQA evaluation.
arXiv Detail & Related papers (2025-05-29T14:44:52Z) - Practical estimation of the optimal classification error with soft labels and calibration [52.1410307583181]
We extend a previous work that utilizes soft labels for estimating the Bayes error, the optimal error rate.<n>We tackle a more challenging problem setting: estimation with corrupted soft labels.<n>Our method is instance-free, i.e., we do not assume access to any input instances.
arXiv Detail & Related papers (2025-05-27T06:04:57Z) - Buggin: Automatic intrinsic bugs classification model using NLP and ML [0.0]
This paper employs Natural Language Processing (NLP) techniques to automatically identify intrinsic bugs.<n>We use two embedding techniques, seBERT and TF-IDF, applied to the title and description text of bug reports.<n>The resulting embeddings are fed into well-established machine learning algorithms such as Support Vector Machine, Logistic Regression, Decision Tree, Random Forest, and K-Nearest Neighbors.
arXiv Detail & Related papers (2025-04-02T16:23:08Z) - QuCheck: A Property-based Testing Framework for Quantum Programs in Qiskit [0.5735035463793009]
Property-based testing has been previously proposed for quantum programs in Q# with QSharpCheck.<n>We propose QuCheck, an enhanced property-based testing framework in Qiskit.
arXiv Detail & Related papers (2025-03-28T17:30:09Z) - Uncertainty-aware Long-tailed Weights Model the Utility of Pseudo-labels for Semi-supervised Learning [50.868594148443215]
We propose an Uncertainty-aware Ensemble Structure (UES) to assess the utility of pseudo-labels for unlabeled samples.<n>UES is lightweight and architecture-agnostic, easily extending to various computer vision tasks, including classification and regression.
arXiv Detail & Related papers (2025-03-13T02:21:04Z) - EquiBench: Benchmarking Large Language Models' Understanding of Program Semantics via Equivalence Checking [55.81461218284736]
EquiBench is a new benchmark for evaluating large language models (LLMs)<n>It determines whether two programs produce identical outputs for all possible inputs.<n>We evaluate 19 state-of-the-art LLMs and find that the best accuracies are 63.8% and 76.2%, only modestly above the 50% random baseline.
arXiv Detail & Related papers (2025-02-18T02:54:25Z) - CITADEL: Context Similarity Based Deep Learning Framework Bug Finding [36.34154201748415]
Existing deep learning (DL) framework testing tools have limited coverage on bug types.<n>We propose Citadel, a method that accelerates the finding of bugs in terms of efficiency and effectiveness.
arXiv Detail & Related papers (2024-06-18T01:51:16Z) - Analyzing Quantum Programs with LintQ: A Static Analysis Framework for Qiskit [21.351834312054844]
This paper presents LintQ, a comprehensive static analysis framework for detecting bugs in quantum programs.
Our approach is enabled by a set of abstractions designed to reason about common concepts in quantum computing without referring to the underlying quantum computing platform.
We apply the approach to a newly collected dataset of 7,568 real-world Qiskit-based quantum programs, showing that LintQ effectively identifies various programming problems.
arXiv Detail & Related papers (2023-10-01T16:36:09Z) - An Empirical Study of Bugs in Quantum Machine Learning Frameworks [5.868747298750261]
We inspect 391 real-world bugs collected from 22 open-source repositories of nine popular QML frameworks.
28% of the bugs are quantum-specific, such as erroneous unitary matrix implementation.
We manually distilled a taxonomy of five symptoms and nine root cause of bugs in QML platforms.
arXiv Detail & Related papers (2023-06-10T07:26:34Z) - An Empirical Study on Bug Severity Estimation using Source Code Metrics and Static Analysis [0.8621608193534838]
We study 3,358 buggy methods with different severity labels from 19 Java open-source projects.
Results show that code metrics are useful in predicting buggy code, but they cannot estimate the severity level of the bugs.
Our categorization shows that Security bugs have high severity in most cases while Edge/Boundary faults have low severity.
arXiv Detail & Related papers (2022-06-26T17:07:23Z) - Experimental violations of Leggett-Garg's inequalities on a quantum
computer [77.34726150561087]
We experimentally observe the violations of Leggett-Garg-Bell's inequalities on single and multi-qubit systems.
Our analysis highlights the limits of nowadays quantum platforms, showing that the above-mentioned correlation functions deviate from theoretical prediction as the number of qubits and the depth of the circuit grow.
arXiv Detail & Related papers (2021-09-06T14:35:15Z) - Solving correlation clustering with QAOA and a Rydberg qudit system: a
full-stack approach [94.37521840642141]
We study the correlation clustering problem using the quantum approximate optimization algorithm (QAOA) and qudits.
Specifically, we consider a neutral atom quantum computer and propose a full stack approach for correlation clustering.
We show the qudit implementation is superior to the qubit encoding as quantified by the gate count.
arXiv Detail & Related papers (2021-06-22T11:07:38Z) - Provable tradeoffs in adversarially robust classification [96.48180210364893]
We develop and leverage new tools, including recent breakthroughs from probability theory on robust isoperimetry.
Our results reveal fundamental tradeoffs between standard and robust accuracy that grow when data is imbalanced.
arXiv Detail & Related papers (2020-06-09T09:58:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.