Early Life Cycle Software Defect Prediction. Why? How?
- URL: http://arxiv.org/abs/2011.13071v3
- Date: Tue, 9 Feb 2021 01:13:15 GMT
- Title: Early Life Cycle Software Defect Prediction. Why? How?
- Authors: N.C. Shrikanth, Suvodeep Majumder and Tim Menzies
- Abstract summary: We analyzed hundreds of popular GitHub projects for 84 months.
Across these projects, most of the defects occur very early in their life cycle.
We hope these results inspire other researchers to adopt a "simplicity-first" approach to their work.
- Score: 37.48549087467758
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many researchers assume that, for software analytics, "more data is better."
We write to show that, at least for learning defect predictors, this may not be
true. To demonstrate this, we analyzed hundreds of popular GitHub projects.
These projects ran for 84 months and contained 3,728 commits (median values).
Across these projects, most of the defects occur very early in their life
cycle. Hence, defect predictors learned from the first 150 commits and four
months perform just as well as anything else. This means that, at least for the
projects studied here, after the first few months, we need not continually
update our defect prediction models. We hope these results inspire other
researchers to adopt a "simplicity-first" approach to their work. Some domains
require a complex and data-hungry analysis. But before assuming complexity, it
is prudent to check the raw data looking for "short cuts" that can simplify the
analysis.
Related papers
- Language models scale reliably with over-training and on downstream tasks [121.69867718185125]
Scaling laws are useful guides for derisking expensive training runs.
However, there remain gaps between current studies and how language models are trained.
In contrast, scaling laws mostly predict loss on inference, but models are usually compared on downstream task performance.
arXiv Detail & Related papers (2024-03-13T13:54:00Z) - Zero-Regret Performative Prediction Under Inequality Constraints [5.513958040574729]
This paper studies performative prediction under inequality constraints.
We develop a robust primal-dual framework that requires only approximate up to a certain accuracy.
We then propose an adaptive primal-dual algorithm for location families.
arXiv Detail & Related papers (2023-09-22T04:54:26Z) - Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls
and New Benchmarking [66.83273589348758]
Link prediction attempts to predict whether an unseen edge exists based on only a portion of edges of a graph.
A flurry of methods have been introduced in recent years that attempt to make use of graph neural networks (GNNs) for this task.
New and diverse datasets have also been created to better evaluate the effectiveness of these new models.
arXiv Detail & Related papers (2023-06-18T01:58:59Z) - Learning from Very Little Data: On the Value of Landscape Analysis for
Predicting Software Project Health [13.19204187502255]
This paper only explores the application of niSNEAK to project health. That said, we see nothing in principle that prevents the application of this technique to a wider range of problems.
arXiv Detail & Related papers (2023-01-16T19:27:16Z) - Non-Clairvoyant Scheduling with Predictions Revisited [77.86290991564829]
In non-clairvoyant scheduling, the task is to find an online strategy for scheduling jobs with a priori unknown processing requirements.
We revisit this well-studied problem in a recently popular learning-augmented setting that integrates (untrusted) predictions in algorithm design.
We show that these predictions have desired properties, admit a natural error measure as well as algorithms with strong performance guarantees.
arXiv Detail & Related papers (2022-02-21T13:18:11Z) - Graph-Based Machine Learning Improves Just-in-Time Defect Prediction [0.38073142980732994]
We use graph-based machine learning to improve Just-In-Time (JIT) defect prediction.
We show that our best model can predict whether or not a code change will lead to a defect with an F1 score as high as 77.55%.
This represents a 152% higher F1 score and a 3% higher MCC over the state-of-the-art JIT defect prediction.
arXiv Detail & Related papers (2021-10-11T16:00:02Z) - Learning to Predict Trustworthiness with Steep Slope Loss [69.40817968905495]
We study the problem of predicting trustworthiness on real-world large-scale datasets.
We observe that the trustworthiness predictors trained with prior-art loss functions are prone to view both correct predictions and incorrect predictions to be trustworthy.
We propose a novel steep slope loss to separate the features w.r.t. correct predictions from the ones w.r.t. incorrect predictions by two slide-like curves that oppose each other.
arXiv Detail & Related papers (2021-09-30T19:19:09Z) - The Early Bird Catches the Worm: Better Early Life Cycle Defect
Predictors [23.22715542777918]
In 240 GitHub projects, we find that the information in that data clumps'' towards the earliest parts of the project.
A defect prediction model learned from just the first 150 commits works as well, or better than state-of-the-art alternatives.
arXiv Detail & Related papers (2021-05-24T03:49:09Z) - Revisiting Process versus Product Metrics: a Large Scale Analysis [32.37197747513998]
We recheck prior small-scale results using 722,471 commits from 700 Github projects.
We find that some analytics in-the-small conclusions still hold when scaling up to analytics in-the-large.
We warn that it is unwise to trust metric importance results from analytics in-the-small studies.
arXiv Detail & Related papers (2020-08-21T16:26:22Z) - Probabilistic Regression for Visual Tracking [193.05958682821444]
We propose a probabilistic regression formulation and apply it to tracking.
Our network predicts the conditional probability density of the target state given an input image.
Our tracker sets a new state-of-the-art on six datasets, achieving 59.8% AUC on LaSOT and 75.8% Success on TrackingNet.
arXiv Detail & Related papers (2020-03-27T17:58:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.