Empirical Analysis of Temporal and Spatial Fault Characteristics in Multi-Fault Bug Repositories
- URL: http://arxiv.org/abs/2508.08872v1
- Date: Tue, 12 Aug 2025 11:55:16 GMT
- Title: Empirical Analysis of Temporal and Spatial Fault Characteristics in Multi-Fault Bug Repositories
- Authors: Dylan Callaghan, Alexandra van der Spuy, Bernd Fischer,
- Abstract summary: We present an empirical analysis of the temporal and spatial characteristics of faults existing in 16 open-source Java and Python projects.<n>Our findings show that many faults in these software systems are long-lived, leading to the majority of software versions having multiple coexisting faults.
- Score: 45.208325853591475
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fixing software faults contributes significantly to the cost of software maintenance and evolution. Techniques for reducing these costs require datasets of software faults, as well as an understanding of the faults, for optimal testing and evaluation. In this paper, we present an empirical analysis of the temporal and spatial characteristics of faults existing in 16 open-source Java and Python projects, which form part of the Defects4J and BugsInPy datasets, respectively. Our findings show that many faults in these software systems are long-lived, leading to the majority of software versions having multiple coexisting faults. This is in contrast to the assumptions of the original datasets, where the majority of versions only identify a single fault. In addition, we show that although the faults are found in only a small subset of the systems, these faults are often evenly distributed amongst this subset, leading to relatively few bug hotspots.
Related papers
- Revisiting Multivariate Time Series Forecasting with Missing Values [65.30332997607141]
Missing values are common in real-world time series.<n>Current approaches have developed an imputation-then-prediction framework that uses imputation modules to fill in missing values, followed by forecasting on the imputed data.<n>This framework overlooks a critical issue: there is no ground truth for the missing values, making the imputation process susceptible to errors that can degrade prediction accuracy.<n>We introduce Consistency-Regularized Information Bottleneck (CRIB), a novel framework built on the Information Bottleneck principle.
arXiv Detail & Related papers (2025-09-27T20:57:48Z) - When Bugs Linger: A Study of Anomalous Resolution Time Outliers and Their Themes [0.0]
This study presents a comprehensive analysis of bug resolution anomalies across seven prominent open-source repositories.<n>Our findings reveal consistent patterns across projects, with anomalies often clustering around test failures, enhancement requests, and user interface issues.
arXiv Detail & Related papers (2025-09-19T16:39:23Z) - From Bugs to Benchmarks: A Comprehensive Survey of Software Defect Datasets [19.140541190998842]
Software defect datasets are collections of software bugs and their associated information.<n>Over the years, numerous software defect datasets have been developed, providing rich resources for the community.<n>This article provides a comprehensive survey of 132 software defect datasets.
arXiv Detail & Related papers (2025-04-24T23:07:04Z) - An Anatomy of 488 Faults from Defects4J Based on the Control- and Data-Flow Graph Representations of Programs [49.38684825106323]
Software fault datasets such as Defects4J provide for each individual fault its location and repair, but do not characterize the faults.<n>We propose a new, direct fault classification scheme based on the control- and data-flow graph representations of programs.
arXiv Detail & Related papers (2025-02-04T13:10:28Z) - SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract.
We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z) - Mining Bug Repositories for Multi-Fault Programs [0.25782420501870285]
We describe an extension to datasets in which multiple bugs are identified in individual entries.
We use test case transplantation and fault location translation, in order to expose and locate the bugs.
We thus provide datasets of true multi-fault versions within real-world software projects.
arXiv Detail & Related papers (2024-03-28T06:35:55Z) - Applying Machine Learning Analysis for Software Quality Test [0.0]
It is critical to comprehend what triggers maintenance and if it may be predicted.
Numerous methods of assessing the complexity of created programs may produce useful prediction models.
In this paper, the machine learning is applied on the available data to calculate the cumulative software failure levels.
arXiv Detail & Related papers (2023-05-16T06:10:54Z) - Shortcomings of Question Answering Based Factuality Frameworks for Error
Localization [51.01957350348377]
We show that question answering (QA)-based factuality metrics fail to correctly identify error spans in generated summaries.
Our analysis reveals a major reason for such poor localization: questions generated by the QG module often inherit errors from non-factual summaries which are then propagated further into downstream modules.
Our experiments conclusively show that there exist fundamental issues with localization using the QA framework which cannot be fixed solely by stronger QA and QG models.
arXiv Detail & Related papers (2022-10-13T05:23:38Z) - Understanding Factual Errors in Summarization: Errors, Summarizers,
Datasets, Error Detectors [105.12462629663757]
In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model.
We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models.
arXiv Detail & Related papers (2022-05-25T15:26:48Z) - A Fault Localization and Debugging Support Framework driven by Bug
Tracking Data [0.11915976684257382]
This thesis aims to provide a fault localization framework by combining data from various sources.
To achieve this, a bug classification schema is introduced, benchmarks are created, and a novel fault localization method based on historical data is proposed.
arXiv Detail & Related papers (2021-03-03T13:23:13Z) - Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users.
We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.