Related papers: Evaluating SZZ Implementations: An Empirical Study on the Linux Kernel

Evaluating SZZ Implementations: An Empirical Study on the Linux Kernel

URL: http://arxiv.org/abs/2308.05060v2
Date: Fri, 7 Jun 2024 10:22:24 GMT
Title: Evaluating SZZ Implementations: An Empirical Study on the Linux Kernel
Authors: Yunbo Lyu, Hong Jin Kang, Ratnadira Widyasari, Julia Lawall, David Lo,
Abstract summary: The evaluation of how ghost commits impact the SZZ algorithm remains limited. Linux kernel developers have started labelling bug-fixing patches with the commit identifiers of the corresponding bug-inducing commit(s) as a standard practice. In this paper, we apply six SZZ algorithms to 76,046 pairs of bug-fixing patches and bug-inducing commits from the Linux kernel.
Score: 8.698309437598944
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The SZZ algorithm is used to connect bug-fixing commits to the earlier commits that introduced bugs. This algorithm has many applications and many variants have been devised. However, there are some types of commits that cannot be traced by the SZZ algorithm, referred to as "ghost commits". The evaluation of how these ghost commits impact the SZZ algorithm remains limited. Moreover, these algorithms have been evaluated on datasets created by software engineering researchers from information in bug trackers and version controlled histories. Since Oct 2013, the Linux kernel developers have started labelling bug-fixing patches with the commit identifiers of the corresponding bug-inducing commit(s) as a standard practice. As of v6.1-rc5, 76,046 pairs of bug-fixing patches and bug-inducing commits are available. This provides a unique opportunity to evaluate the SZZ algorithm on a large dataset that has been created and reviewed by project developers, entirely independently of the biases of software engineering researchers. In this paper, we apply six SZZ algorithms to 76,046 pairs of bug-fixing patches and bug-introducing commits from the Linux kernel. Our findings reveal that SZZ algorithms experience a more significant decline in recall on our dataset (13.8%) as compared to prior findings reported by Rosa et al., and the disparities between the individual SZZ algorithms diminish. Moreover, we find that 17.47% of bug-fixing commits are ghost commits. Finally, we propose Tracing-Commit SZZ (TC-SZZ), that traces all commits in the change history of lines modified or deleted in bug-fixing commits. Applying TC-SZZ to all failure cases, excluding ghost commits, we found that TC-SZZ could identify 17.7% of them. Our further analysis found that 34.6% of bug-inducing commits were in the function history, 27.5% in the file history (but not in the function history), and...

Related papers

Identifying Root Cause of bugs by Capturing Changed Code Lines with Relational Graph Neural Networks [7.676213873923721]
We propose a method called RC-Detection to detect root-cause deletion lines in changed code lines.<n>RC-Detection is used to detect root-cause deletion lines in changed code lines, thereby identifying the root cause of introduced bugs in bug-fixing commits.<n>Our experiments show that, compared to the most advanced root cause detection methods, RC-Detection improved Recall@1, Recall@2, Recall@3, and MFR by at 4.107%, 5.113%, 4.289%, and 24.536%, respectively.
arXiv Detail & Related papers (2025-05-02T04:29:09Z)
LLM4SZZ: Enhancing SZZ Algorithm with Context-Enhanced Assessment on Large Language Models [10.525352489242398]
The SZZ algorithm is the dominant technique for identifying bug-inducing commits. It serves as a foundation for many software engineering studies, such as bug prediction and static code analysis. Recently, a deep learning-based SZZ algorithm has been introduced to enhance the original SZZ algorithm.
arXiv Detail & Related papers (2025-04-02T06:40:57Z)
GraphFuzz: Automated Testing of Graph Algorithm Implementations with Differential Fuzzing and Lightweight Feedback [7.099737083842058]
We introduce GraphFuzz, the first automated feedback-guided fuzzing framework for graph algorithm implementations. Our key innovation lies in identifying lightweight and algorithm-specific feedback signals to combine with or completely replace the code coverage feedback. GraphFuzz applies differential testing to detect both crash-triggering bugs and logic bugs.
arXiv Detail & Related papers (2025-02-21T02:47:05Z)
WIA-SZZ: Work Item Aware SZZ [3.7232697932311645]
Existing SZZ algorithms identify the potential commit that induced a bug when given a fixing commit as input. We build a new variant of SZZ that leverages our work item detecting commits to first suggest bug-inducing commits. Our evaluation reveals 64% is accurate in finding work items, but most importantly it is able to find many bug-inducing commits.
arXiv Detail & Related papers (2024-11-19T18:59:14Z)
Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation? [90.30635552818875]
We present Touchstone, a large-scale collaborative segmentation benchmark of 9 types of abdominal organs. This benchmark is based on 5,195 training CT scans from 76 hospitals around the world and 5,903 testing CT scans from 11 additional hospitals. We invited 14 inventors of 19 AI algorithms to train their algorithms, while our team, as a third party, independently evaluated these algorithms on three test sets.
arXiv Detail & Related papers (2024-11-06T05:09:34Z)
CITADEL: Context Similarity Based Deep Learning Framework Bug Finding [36.34154201748415]
Existing deep learning (DL) framework testing tools have limited coverage on bug types. We propose Citadel, a method that accelerates the finding of bugs in terms of efficiency and effectiveness.
arXiv Detail & Related papers (2024-06-18T01:51:16Z)
Identifying Defect-Inducing Changes in Visual Code [54.20154707138088]
"SZZ Visual Code" (SZZ-VC) is an algorithm that finds changes in visual code based on the differences of graphical elements rather than differences of lines to detect defect-inducing changes. We validated the algorithm for an industry-made AAA video game and 20 music visual programming defects across 12 open source projects.
arXiv Detail & Related papers (2023-09-07T00:12:28Z)
ALGO: Synthesizing Algorithmic Programs with LLM-Generated Oracle Verifiers [60.6418431624873]
Large language models (LLMs) excel at implementing code from functionality descriptions but struggle with algorithmic problems. We propose ALGO, a framework that synthesizes Algorithmic programs with LLM-Generated Oracles to guide the generation and verify their correctness. Experiments show that when equipped with ALGO, we achieve an 8x better one-submission pass rate over the Codex model and a 2.6x better one-submission pass rate over CodeT.
arXiv Detail & Related papers (2023-05-24T00:10:15Z)
Multi-Granularity Detector for Vulnerability Fixes [13.653249890867222]
We propose MiDas (Multi-Granularity Detector for Vulnerability Fixes) to identify vulnerability-fixing commits. MiDas constructs different neural networks for each level of code change granularity, corresponding to commit-level, file-level, hunk-level, and line-level. MiDas outperforms the current state-of-the-art baseline in terms of AUC by 4.9% and 13.7% on Java and Python-based datasets.
arXiv Detail & Related papers (2023-05-23T10:06:28Z)
What Happens When We Fuzz? Investigating OSS-Fuzz Bug History [0.9772968596463595]
We analyzed 44,102 reported issues made public by OSS-Fuzz prior to March 12, 2022. We identified the bug-contributing commits to estimate when the bug containing code was introduced, and measure the timeline from introduction to detection to fix.
arXiv Detail & Related papers (2023-05-19T05:15:36Z)
A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper. Our dataset consists of 477 self-reported expertise scores provided by 58 researchers. For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z)
Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers. We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z)
ADPTriage: Approximate Dynamic Programming for Bug Triage [0.0]
We develop a Markov decision process (MDP) model for an online bug triage task. We provide an ADP-based bug triage solution, called ADPTriage, which reflects downstream uncertainty in the bug arrivals and developers' timetables. Our result shows a significant improvement over the myopic approach in terms of assignment accuracy and fixing time.
arXiv Detail & Related papers (2022-11-02T04:42:21Z)
Lifelong Bandit Optimization: No Prior and No Regret [70.94238868711952]
We develop LIBO, an algorithm which adapts to the environment by learning from past experience. We assume a kernelized structure where the kernel is unknown but shared across all tasks. Our algorithm can be paired with any kernelized or linear bandit algorithm and guarantees optimal performance.
arXiv Detail & Related papers (2022-10-27T14:48:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.