Related papers: Leveraging Structural Properties of Source Code Graphs for Just-In-Time Bug Prediction

Leveraging Structural Properties of Source Code Graphs for Just-In-Time Bug Prediction

URL: http://arxiv.org/abs/2201.10137v1
Date: Tue, 25 Jan 2022 07:20:47 GMT
Title: Leveraging Structural Properties of Source Code Graphs for Just-In-Time Bug Prediction
Authors: Md Nadim, Debajyoti Mondal, Chanchal K. Roy
Abstract summary: A graph is one of the most commonly used representations for understanding relational data. In this study, we propose a methodology to utilize the relational properties of source code in the form of a graph.
Score: 6.467090475885797
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The most common use of data visualization is to minimize the complexity for proper understanding. A graph is one of the most commonly used representations for understanding relational data. It produces a simplified representation of data that is challenging to comprehend if kept in a textual format. In this study, we propose a methodology to utilize the relational properties of source code in the form of a graph to identify Just-in-Time (JIT) bug prediction in software systems during different revisions of software evolution and maintenance. We presented a method to convert the source codes of commit patches to equivalent graph representations and named it Source Code Graph (SCG). To understand and compare multiple source code graphs, we extracted several structural properties of these graphs, such as the density, number of cycles, nodes, edges, etc. We then utilized the attribute values of those SCGs to visualize and detect buggy software commits. We process more than 246K software commits from 12 subject systems in this investigation. Our investigation on these 12 open-source software projects written in C++ and Java programming languages shows that if we combine the features from SCG with conventional features used in similar studies, we will get the increased performance of Machine Learning (ML) based buggy commit detection models. We also find the increase of F1~Scores in predicting buggy and non-buggy commits statistically significant using the Wilcoxon Signed Rank Test. Since SCG-based feature values represent the style or structural properties of source code updates or changes in the software system, it suggests the importance of careful maintenance of source code style or structure for keeping a software system bug-free.

Related papers

GC-Bench: An Open and Unified Benchmark for Graph Condensation [54.70801435138878]
We develop a comprehensive Graph Condensation Benchmark (GC-Bench) to analyze the performance of graph condensation. GC-Bench systematically investigates the characteristics of graph condensation in terms of the following dimensions: effectiveness, transferability, and complexity. We have developed an easy-to-use library for training and evaluating different GC methods to facilitate reproducible research.
arXiv Detail & Related papers (2024-06-30T07:47:34Z)
CONCORD: Towards a DSL for Configurable Graph Code Representation [3.756550107432323]
We introduce CONCORD, a domain-specific language to build customizable graph representations. We demonstrate its effectiveness in code smell detection as an illustrative use case. ConCORD will help researchers create and experiment with customizable graph-based code representations.
arXiv Detail & Related papers (2024-01-31T16:16:48Z)
Semantic Code Graph -- an information model to facilitate software comprehension [0.0]
There is an increasing need to accelerate the code comprehension process to facilitate maintenance and reduce associated costs. While a variety of code structure models already exist, there is a surprising lack of models that closely represent the source code. We propose the Semantic Code Graph (SCG), an information model that offers a detailed abstract representation of code dependencies.
arXiv Detail & Related papers (2023-10-03T15:09:49Z)
DSHGT: Dual-Supervisors Heterogeneous Graph Transformer -- A pioneer study of using heterogeneous graph learning for detecting software vulnerabilities [12.460745260973837]
Vulnerability detection is a critical problem in software security and attracts growing attention both from academia and industry. Recent advances in deep learning, especially Graph Neural Networks (GNN), have uncovered the feasibility of automatic detection of a wide range of software vulnerabilities. In this work, we are one of the first to explore heterogeneous graph representation in the form of Code Property Graph.
arXiv Detail & Related papers (2023-06-02T08:57:13Z)
A Unified Active Learning Framework for Annotating Graph Data with Application to Software Source Code Performance Prediction [4.572330678291241]
We develop a unified active learning framework specializing in software performance prediction. We investigate the impact of using different levels of information for active and passive learning. Our approach aims to improve the investment in AI models for different software performance predictions.
arXiv Detail & Related papers (2023-04-06T14:00:48Z)
Variational Graph Generator for Multi-View Graph Clustering [51.89092260088973]
We propose Variational Graph Generator for Multi-View Graph Clustering (VGMGC) This generator infers a reliable variational consensus graph based on a priori assumption over multiple graphs. It embeds the inferred view-common graph and view-specific graphs together with features.
arXiv Detail & Related papers (2022-10-13T13:19:51Z)
GraphCoCo: Graph Complementary Contrastive Learning [65.89743197355722]
Graph Contrastive Learning (GCL) has shown promising performance in graph representation learning (GRL) without the supervision of manual annotations. This paper proposes an effective graph complementary contrastive learning approach named GraphCoCo to tackle the above issue.
arXiv Detail & Related papers (2022-03-24T02:58:36Z)
Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z)
deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search [15.19181807445119]
We propose a learnable deep Graph for Code Search (called deGraphCS) to transfer source code into variable-based flow graphs. We collect a large-scale dataset from GitHub containing 41,152 code snippets written in C language.
arXiv Detail & Related papers (2021-03-24T06:57:44Z)
Learning to map source code to software vulnerability using code-as-a-graph [67.62847721118142]
We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective. We show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches.
arXiv Detail & Related papers (2020-06-15T16:05:27Z)
A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens. We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z)
Auto-Encoding Twin-Bottleneck Hashing [141.5378966676885]
This paper proposes an efficient and adaptive code-driven graph. It is updated by decoding in the context of an auto-encoder. Experiments on benchmarked datasets clearly show the superiority of our framework over the state-of-the-art hashing methods.
arXiv Detail & Related papers (2020-02-27T05:58:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.