Related papers: Fast Fixes and Faulty Drivers: An Empirical Analysis of Regression Bug Fixing Times in the Linux Kernel

Fast Fixes and Faulty Drivers: An Empirical Analysis of Regression Bug Fixing Times in the Linux Kernel

URL: http://arxiv.org/abs/2411.02091v1
Date: Mon, 04 Nov 2024 13:53:29 GMT
Title: Fast Fixes and Faulty Drivers: An Empirical Analysis of Regression Bug Fixing Times in the Linux Kernel
Authors: Jukka Ruohonen, Adam Alami,
Abstract summary: The paper focuses on regression bug tracking in the kernel by considering the time required to fix regression bugs. The dataset examined is based on the regzbot automation framework for tracking regressions in the Linux kernel.
Score: 3.1959458747110054
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Regression bugs refer to situations in which something that worked previously no longer works currently. Such bugs have been pronounced in the Linux kernel. The paper focuses on regression bug tracking in the kernel by considering the time required to fix regression bugs. The dataset examined is based on the regzbot automation framework for tracking regressions in the Linux kernel. According to the results, (i) regression bug fixing times have been faster than previously reported; between 2021 and 2024, on average, it has taken less than a month to fix regression bugs. It is further evident that (ii) device drivers constitute the most prone subsystem for regression bugs, and also the fixing times vary across the kernel's subsystems. Although (iii) most commits fixing regression bugs have been reviewed, tested, or both, the kernel's code reviewing and manual testing practices do not explain the fixing times. Likewise, (iv) there is only a weak signal that code churn might contribute to explaining the fixing times statistically. Finally, (v) some subsystems exhibit strong effects for explaining the bug fixing times statistically, although overall statistical performance is modest but not atypical to the research domain. With these empirical results, the paper contributes to the efforts to better understand software regressions and their tracking in the Linux kernel.

Related papers

CrashFixer: A crash resolution agent for the Linux kernel [58.152358195983155]
This work builds upon kGym, which shares a benchmark for system-level Linux kernel bugs and a platform to run experiments on the Linux kernel. This paper introduces CrashFixer, the first LLM-based software repair agent that is applicable to Linux kernel bugs.
arXiv Detail & Related papers (2025-04-29T04:18:51Z)
Automatically Detecting Numerical Instability in Machine Learning Applications via Soft Assertions [7.893728124841138]
numerical bugs can lead to system crashes, incorrect output, and wasted computing resources. We introduce a novel idea, namely soft assertions (SA), to encode safety/error conditions for the places where numerical instability can occur.
arXiv Detail & Related papers (2025-04-22T00:55:33Z)
KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution [59.20933707301566]
Large Language Models (LLMs) are consistently improving at increasingly realistic software engineering (SE) tasks. In real-world software stacks, significant SE effort is spent developing foundational system software like the Linux kernel. To evaluate if ML models are useful while developing such large-scale systems-level software, we introduce kGym and kBench.
arXiv Detail & Related papers (2024-07-02T21:44:22Z)
Leveraging Stack Traces for Spectrum-based Fault Localization in the Absence of Failing Tests [44.13331329339185]
We introduce a new approach, SBEST, that integrates stack trace data with test coverage to enhance fault localization. Our approach shows a significant improvement, increasing Mean Average Precision (MAP) by 32.22% and Mean Reciprocal Rank (MRR) by 17.43% over traditional stack trace ranking methods.
arXiv Detail & Related papers (2024-05-01T15:15:52Z)
ADPTriage: Approximate Dynamic Programming for Bug Triage [0.0]
We develop a Markov decision process (MDP) model for an online bug triage task. We provide an ADP-based bug triage solution, called ADPTriage, which reflects downstream uncertainty in the bug arrivals and developers' timetables. Our result shows a significant improvement over the myopic approach in terms of assignment accuracy and fixing time.
arXiv Detail & Related papers (2022-11-02T04:42:21Z)
Infrared: A Meta Bug Detector [10.541969253100815]
We propose a new approach, called meta bug detection, which offers three crucial advantages over existing learning-based bug detectors. Our evaluation shows our meta bug detector (MBD) is effective in catching a variety of bugs including null pointer dereference, array index out-of-bound, file handle leak, and even data races in concurrent programs.
arXiv Detail & Related papers (2022-09-18T09:08:51Z)
An Empirical Study on Bug Severity Estimation using Source Code Metrics and Static Analysis [0.8621608193534838]
We study 3,358 buggy methods with different severity labels from 19 Java open-source projects. Results show that code metrics are useful in predicting buggy code, but they cannot estimate the severity level of the bugs. Our categorization shows that Security bugs have high severity in most cases while Edge/Boundary faults have low severity.
arXiv Detail & Related papers (2022-06-26T17:07:23Z)
VSAC: Efficient and Accurate Estimator for H and F [68.65610177368617]
VSAC is a RANSAC-type robust estimator with a number of novelties. It is significantly faster than all its predecessors and runs on average in 1-2 ms, on a CPU. It is two orders of magnitude faster and yet as precise as MAGSAC++, the currently most accurate estimator of two-view geometry.
arXiv Detail & Related papers (2021-06-18T17:04:57Z)
Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates. We formulate the regression-free model updates into a constrained optimization problem. We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z)
Generating Bug-Fixes Using Pretrained Transformers [11.012132897417592]
We introduce a data-driven program repair approach which learns to detect and fix bugs in Java methods mined from real-world GitHub. We show that pretraining on source code programs improves the number of patches found by 33% as compared to supervised training from scratch. We refine the standard accuracy evaluation metric into non-deletion and deletion-only fixes, and show that our best model generates 75% more non-deletion fixes than the previous state of the art.
arXiv Detail & Related papers (2021-04-16T05:27:04Z)
SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets. Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z)
Detecting Rewards Deterioration in Episodic Reinforcement Learning [63.49923393311052]
In many RL applications, once training ends, it is vital to detect any deterioration in the agent performance as soon as possible. We consider an episodic framework, where the rewards within each episode are not independent, nor identically-distributed, nor Markov. We define the mean-shift in a way corresponding to deterioration of a temporal signal (such as the rewards), and derive a test for this problem with optimal statistical power.
arXiv Detail & Related papers (2020-10-22T12:45:55Z)
TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task [80.38130122127882]
TACRED is one of the largest, most widely used crowdsourced datasets in Relation Extraction (RE) In this paper, we investigate the questions: Have we reached a performance ceiling or is there still room for improvement? We find that label errors account for 8% absolute F1 test error, and that more than 50% of the examples need to be relabeled.
arXiv Detail & Related papers (2020-04-30T15:07:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.