Related papers: Enhanced Bug Prediction in JavaScript Programs with Hybrid Call-Graph Based Invocation Metrics

Enhanced Bug Prediction in JavaScript Programs with Hybrid Call-Graph Based Invocation Metrics

URL: http://arxiv.org/abs/2405.07244v1
Date: Sun, 12 May 2024 10:31:43 GMT
Title: Enhanced Bug Prediction in JavaScript Programs with Hybrid Call-Graph Based Invocation Metrics
Authors: Gábor Antal, Zoltán Tóth, Péter Hegedűs, Rudolf Ferenc,
Abstract summary: Bug prediction aims at finding source code elements in a software system that are likely to contain defects. In this paper, we propose a function level JavaScript bug prediction model based on static source code metrics with the addition of a hybrid (static and dynamic) code analysis based metric of the number of incoming and outgoing function calls.
Score: 0.7099737083842057
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Bug prediction aims at finding source code elements in a software system that are likely to contain defects. Being aware of the most error-prone parts of the program, one can efficiently allocate the limited amount of testing and code review resources. Therefore, bug prediction can support software maintenance and evolution to a great extent. In this paper, we propose a function level JavaScript bug prediction model based on static source code metrics with the addition of a hybrid (static and dynamic) code analysis based metric of the number of incoming and outgoing function calls (HNII and HNOI). Our motivation for this is that JavaScript is a highly dynamic scripting language for which static code analysis might be very imprecise; therefore, using a purely static source code features for bug prediction might not be enough. Based on a study where we extracted 824 buggy and 1943 non-buggy functions from the publicly available BugsJS dataset for the ESLint JavaScript project, we can confirm the positive impact of hybrid code metrics on the prediction performance of the ML models. Depending on the ML algorithm, applied hyper-parameters, and target measures we consider, hybrid invocation metrics bring a 2-10% increase in model performances (i.e., precision, recall, F-measure). Interestingly, replacing static NOI and NII metrics with their hybrid counterparts HNOI and HNII in itself improves model performances; however, using them all together yields the best results.

Related papers

Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks. However, improvement is plateauing due to the exhaustion of readily available high-quality data. We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z)
On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation [4.065344017083881]
We analyze the ability of embedding-based metrics like CodeBERTScore to measure functional correctness and other helpful constructs like editing effort. Our results show that while they have a weak correlation with functional correctness (0.16), they are strongly correlated (0.72) with editing effort.
arXiv Detail & Related papers (2024-04-26T15:54:39Z)
Combined Static Analysis and Machine Learning Prediction for Application Debloating [2.010931857032585]
We develop a framework, Predictive Debloat with Static Guarantees (PDSG) PDSG predicts the dynamic callee set emanating from a callsite, and to resolve mispredictions, it employs a lightweight audit based on static invariants of call chains. It achieves the highest gadget reductions among similar techniques on SPEC CPU 2017, reducing 82.5% of the total gadgets on average.
arXiv Detail & Related papers (2024-03-30T00:14:17Z)
De-Hallucinator: Mitigating LLM Hallucinations in Code Generation Tasks via Iterative Grounding [18.129031749321058]
Large language models (LLMs) trained on datasets of publicly available source code have established a new state of the art in code generation tasks. LLMs are mostly unaware of the code that exists within a specific project, preventing the models from making good use of existing APIs. This paper presents De-Hallucinator, a technique that grounds the predictions of an LLM through a novel combination of retrieving suitable API references.
arXiv Detail & Related papers (2024-01-03T12:09:43Z)
RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder for Language Modeling [79.56442336234221]
We introduce RegaVAE, a retrieval-augmented language model built upon the variational auto-encoder (VAE) It encodes the text corpus into a latent space, capturing current and future information from both source and target text. Experimental results on various datasets demonstrate significant improvements in text generation quality and hallucination removal.
arXiv Detail & Related papers (2023-10-16T16:42:01Z)
Method-Level Bug Severity Prediction using Source Code Metrics and LLMs [0.628122931748758]
We investigate source code metrics, source code representation using large language models (LLMs), and their combination in predicting bug severity labels. Our results suggest that Decision Tree and Random Forest models outperform other models regarding our several evaluation metrics. CodeBERT finetuning improves the bug severity prediction results significantly in the range of 29%-140% for several evaluation metrics.
arXiv Detail & Related papers (2023-09-06T14:38:07Z)
A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems. static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models. We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z)
Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation. We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z)
Interactive Code Generation via Test-Driven User-Intent Formalization [60.90035204567797]
Large language models (LLMs) produce code from informal natural language (NL) intent. It is hard to define a notion of correctness since natural language can be ambiguous and lacks a formal semantics. We describe a language-agnostic abstract algorithm and a concrete implementation TiCoder.
arXiv Detail & Related papers (2022-08-11T17:41:08Z)
Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it. Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z)
Generating Bug-Fixes Using Pretrained Transformers [11.012132897417592]
We introduce a data-driven program repair approach which learns to detect and fix bugs in Java methods mined from real-world GitHub. We show that pretraining on source code programs improves the number of patches found by 33% as compared to supervised training from scratch. We refine the standard accuracy evaluation metric into non-deletion and deletion-only fixes, and show that our best model generates 75% more non-deletion fixes than the previous state of the art.
arXiv Detail & Related papers (2021-04-16T05:27:04Z)
Towards More Fine-grained and Reliable NLP Performance Prediction [85.78131503006193]
We make two contributions to improving performance prediction for NLP tasks. First, we examine performance predictors for holistic measures of accuracy like F1 or BLEU. Second, we propose methods to understand the reliability of a performance prediction model from two angles: confidence intervals and calibration.
arXiv Detail & Related papers (2021-02-10T15:23:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.