Predicting Line-Level Defects by Capturing Code Contexts with
Hierarchical Transformers
- URL: http://arxiv.org/abs/2312.11889v1
- Date: Tue, 19 Dec 2023 06:25:04 GMT
- Title: Predicting Line-Level Defects by Capturing Code Contexts with
Hierarchical Transformers
- Authors: Parvez Mahbub and Mohammad Masudur Rahman
- Abstract summary: Bugsplorer is a novel deep-learning technique for line-level defect prediction.
It can rank the first 20% defective lines within the top 1-3% suspicious lines.
It has the potential to significantly reduce SQA costs by ranking defective lines higher.
- Score: 4.73194777046253
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Software defects consume 40% of the total budget in software development and
cost the global economy billions of dollars every year. Unfortunately, despite
the use of many software quality assurance (SQA) practices in software
development (e.g., code review, continuous integration), defects may still
exist in the official release of a software product. Therefore, prioritizing
SQA efforts for the vulnerable areas of the codebase is essential to ensure the
high quality of a software release. Predicting software defects at the line
level could help prioritize the SQA effort but is a highly challenging task
given that only ~3% of lines of a codebase could be defective. Existing works
on line-level defect prediction often fall short and cannot fully leverage the
line-level defect information. In this paper, we propose Bugsplorer, a novel
deep-learning technique for line-level defect prediction. It leverages a
hierarchical structure of transformer models to represent two types of code
elements: code tokens and code lines. Unlike the existing techniques that are
optimized for file-level defect prediction, Bugsplorer is optimized for a
line-level defect prediction objective. Our evaluation with five performance
metrics shows that Bugsplorer has a promising capability of predicting
defective lines with 26-72% better accuracy than that of the state-of-the-art
technique. It can rank the first 20% defective lines within the top 1-3%
suspicious lines. Thus, Bugsplorer has the potential to significantly reduce
SQA costs by ranking defective lines higher.
Related papers
- Code Revert Prediction with Graph Neural Networks: A Case Study at J.P. Morgan Chase [10.961209762486684]
Code revert prediction aims to forecast or predict the likelihood of code changes being reverted or rolled back in software development.
Previous methods for code defect detection relied on independent features but ignored relationships between code scripts.
This paper presents a systematic empirical study for code revert prediction that integrates the code import graph with code features.
arXiv Detail & Related papers (2024-03-14T15:54:29Z) - BAFLineDP: Code Bilinear Attention Fusion Framework for Line-Level
Defect Prediction [0.0]
This paper presents a line-level defect prediction method grounded in a code bilinear attention fusion framework (BAFLineDP)
Our results demonstrate that BAFLineDP outperforms current advanced file-level and line-level defect prediction approaches.
arXiv Detail & Related papers (2024-02-11T09:01:42Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - PrAIoritize: Automated Early Prediction and Prioritization of Vulnerabilities in Smart Contracts [1.081463830315253]
Smart contracts are prone to numerous security threats due to undisclosed vulnerabilities and code weaknesses.
Efficient prioritization is crucial for smart contract security.
Our research aims to provide an automated approach, PrAIoritize, for prioritizing and predicting critical code weaknesses.
arXiv Detail & Related papers (2023-08-21T23:30:39Z) - Bridging Precision and Confidence: A Train-Time Loss for Calibrating
Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions.
Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z) - Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code.
We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z) - A Universal Error Measure for Input Predictions Applied to Online Graph
Problems [57.58926849872494]
We introduce a novel measure for quantifying the error in input predictions.
The measure captures errors due to absent predicted requests as well as unpredicted actual requests.
arXiv Detail & Related papers (2022-05-25T15:24:03Z) - HEDP: A Method for Early Forecasting Software Defects based on Human
Error Mechanisms [1.0660480034605238]
The main process behind a software defect is that an error-prone scenario triggers human error modes.
The proposed idea emphasizes predicting the exact location and form of a possible defect.
arXiv Detail & Related papers (2021-10-13T14:44:23Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z) - Machine Learning Techniques for Software Quality Assurance: A Survey [5.33024001730262]
We discuss various approaches in both fault prediction and test case prioritization.
Recent studies deep learning algorithms for fault prediction help to bridge the gap between programs' semantics and fault prediction features.
arXiv Detail & Related papers (2021-04-29T00:37:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.