Leveraging Structural Properties of Source Code Graphs for Just-In-Time
Bug Prediction
- URL: http://arxiv.org/abs/2201.10137v1
- Date: Tue, 25 Jan 2022 07:20:47 GMT
- Title: Leveraging Structural Properties of Source Code Graphs for Just-In-Time
Bug Prediction
- Authors: Md Nadim, Debajyoti Mondal, Chanchal K. Roy
- Abstract summary: A graph is one of the most commonly used representations for understanding relational data.
In this study, we propose a methodology to utilize the relational properties of source code in the form of a graph.
- Score: 6.467090475885797
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The most common use of data visualization is to minimize the complexity for
proper understanding. A graph is one of the most commonly used representations
for understanding relational data. It produces a simplified representation of
data that is challenging to comprehend if kept in a textual format. In this
study, we propose a methodology to utilize the relational properties of source
code in the form of a graph to identify Just-in-Time (JIT) bug prediction in
software systems during different revisions of software evolution and
maintenance. We presented a method to convert the source codes of commit
patches to equivalent graph representations and named it Source Code Graph
(SCG). To understand and compare multiple source code graphs, we extracted
several structural properties of these graphs, such as the density, number of
cycles, nodes, edges, etc. We then utilized the attribute values of those SCGs
to visualize and detect buggy software commits. We process more than 246K
software commits from 12 subject systems in this investigation. Our
investigation on these 12 open-source software projects written in C++ and Java
programming languages shows that if we combine the features from SCG with
conventional features used in similar studies, we will get the increased
performance of Machine Learning (ML) based buggy commit detection models. We
also find the increase of F1~Scores in predicting buggy and non-buggy commits
statistically significant using the Wilcoxon Signed Rank Test. Since SCG-based
feature values represent the style or structural properties of source code
updates or changes in the software system, it suggests the importance of
careful maintenance of source code style or structure for keeping a software
system bug-free.
Related papers
- GC-Bench: An Open and Unified Benchmark for Graph Condensation [54.70801435138878]
We develop a comprehensive Graph Condensation Benchmark (GC-Bench) to analyze the performance of graph condensation.
GC-Bench systematically investigates the characteristics of graph condensation in terms of the following dimensions: effectiveness, transferability, and complexity.
We have developed an easy-to-use library for training and evaluating different GC methods to facilitate reproducible research.
arXiv Detail & Related papers (2024-06-30T07:47:34Z) - CONCORD: Towards a DSL for Configurable Graph Code Representation [3.756550107432323]
We introduce CONCORD, a domain-specific language to build customizable graph representations.
We demonstrate its effectiveness in code smell detection as an illustrative use case.
ConCORD will help researchers create and experiment with customizable graph-based code representations.
arXiv Detail & Related papers (2024-01-31T16:16:48Z) - Semantic Code Graph -- an information model to facilitate software
comprehension [0.0]
There is an increasing need to accelerate the code comprehension process to facilitate maintenance and reduce associated costs.
While a variety of code structure models already exist, there is a surprising lack of models that closely represent the source code.
We propose the Semantic Code Graph (SCG), an information model that offers a detailed abstract representation of code dependencies.
arXiv Detail & Related papers (2023-10-03T15:09:49Z) - DSHGT: Dual-Supervisors Heterogeneous Graph Transformer -- A pioneer study of using heterogeneous graph learning for detecting software vulnerabilities [12.460745260973837]
Vulnerability detection is a critical problem in software security and attracts growing attention both from academia and industry.
Recent advances in deep learning, especially Graph Neural Networks (GNN), have uncovered the feasibility of automatic detection of a wide range of software vulnerabilities.
In this work, we are one of the first to explore heterogeneous graph representation in the form of Code Property Graph.
arXiv Detail & Related papers (2023-06-02T08:57:13Z) - A Unified Active Learning Framework for Annotating Graph Data with
Application to Software Source Code Performance Prediction [4.572330678291241]
We develop a unified active learning framework specializing in software performance prediction.
We investigate the impact of using different levels of information for active and passive learning.
Our approach aims to improve the investment in AI models for different software performance predictions.
arXiv Detail & Related papers (2023-04-06T14:00:48Z) - GraphCoCo: Graph Complementary Contrastive Learning [65.89743197355722]
Graph Contrastive Learning (GCL) has shown promising performance in graph representation learning (GRL) without the supervision of manual annotations.
This paper proposes an effective graph complementary contrastive learning approach named GraphCoCo to tackle the above issue.
arXiv Detail & Related papers (2022-03-24T02:58:36Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search [15.19181807445119]
We propose a learnable deep Graph for Code Search (called deGraphCS) to transfer source code into variable-based flow graphs.
We collect a large-scale dataset from GitHub containing 41,152 code snippets written in C language.
arXiv Detail & Related papers (2021-03-24T06:57:44Z) - Learning to map source code to software vulnerability using
code-as-a-graph [67.62847721118142]
We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective.
We show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches.
arXiv Detail & Related papers (2020-06-15T16:05:27Z) - A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens.
We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z) - Auto-Encoding Twin-Bottleneck Hashing [141.5378966676885]
This paper proposes an efficient and adaptive code-driven graph.
It is updated by decoding in the context of an auto-encoder.
Experiments on benchmarked datasets clearly show the superiority of our framework over the state-of-the-art hashing methods.
arXiv Detail & Related papers (2020-02-27T05:58:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.