The Vulnerability Is in the Details: Locating Fine-grained Information
of Vulnerable Code Identified by Graph-based Detectors
- URL: http://arxiv.org/abs/2401.02737v2
- Date: Wed, 21 Feb 2024 08:21:43 GMT
- Title: The Vulnerability Is in the Details: Locating Fine-grained Information
of Vulnerable Code Identified by Graph-based Detectors
- Authors: Baijun Cheng, Kailong Wang, Cuiyun Gao, Xiapu Luo, Yulei Sui, Li Li,
Yao Guo, Xiangqun Chen, Haoyu Wang
- Abstract summary: VULEXPLAINER is a tool for locating vulnerability-critical code lines from coarse-level vulnerable code snippets.
It can flag the vulnerability-triggering code statements with an accuracy of around 90% against eight common C/C++ vulnerabilities.
- Score: 39.01486277170386
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vulnerability detection is a crucial component in the software development
lifecycle. Existing vulnerability detectors, especially those based on deep
learning (DL) models, have achieved high effectiveness. Despite their
capability of detecting vulnerable code snippets from given code fragments, the
detectors are typically unable to further locate the fine-grained information
pertaining to the vulnerability, such as the precise vulnerability triggering
locations.In this paper, we propose VULEXPLAINER, a tool for automatically
locating vulnerability-critical code lines from coarse-level vulnerable code
snippets reported by DL-based detectors.Our approach takes advantage of the
code structure and the semantics of the vulnerabilities. Specifically, we
leverage program slicing to get a set of critical program paths containing
vulnerability-triggering and vulnerability-dependent statements and rank them
to pinpoint the most important one (i.e., sub-graph) as the data flow
associated with the vulnerability. We demonstrate that VULEXPLAINER performs
consistently well on four state-of-the-art graph-representation(GP)-based
vulnerability detectors, i.e., it can flag the vulnerability-triggering code
statements with an accuracy of around 90% against eight common C/C++
vulnerabilities, outperforming five widely used GNN-based explanation
approaches. The experimental results demonstrate the effectiveness of
VULEXPLAINER, which provides insights into a promising research line:
integrating program slicing and deep learning for the interpretation of
vulnerable code fragments.
Related papers
- Enhancing Code Vulnerability Detection via Vulnerability-Preserving Data Augmentation [29.72520866016839]
Source code vulnerability detection aims to identify inherent vulnerabilities to safeguard software systems from potential attacks.
Many prior studies overlook diverse vulnerability characteristics, simplifying the problem into a binary (0-1) classification task.
FGVulDet employs multiple classifiers to discern characteristics of various vulnerability types and combines their outputs to identify the specific type of vulnerability.
FGVulDet is trained on a large-scale dataset from GitHub, encompassing five different types of vulnerabilities.
arXiv Detail & Related papers (2024-04-15T09:10:52Z) - FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs [54.27040631527217]
We propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries.
FoC-BinLLM outperforms ChatGPT by 14.61% on the ROUGE-L score.
FoC-Sim outperforms the previous best methods with a 52% higher Recall@1.
arXiv Detail & Related papers (2024-03-27T09:45:33Z) - Toward Improved Deep Learning-based Vulnerability Detection [6.212044762686268]
Vulnerabilities in datasets have to be represented in a certain way, e.g., code lines, functions, or program slices within which the vulnerabilities exist.
The detectors learn how base units can be vulnerable and then predict whether other base units are vulnerable.
We have hypothesized that this focus on individual base units harms the ability of the detectors to properly detect those vulnerabilities that span multiple base units.
We present our study and a framework that can be used to help DL-based detectors toward the proper inclusion of MBU vulnerabilities.
arXiv Detail & Related papers (2024-03-05T14:57:28Z) - Vignat: Vulnerability identification by learning code semantics via
graph attention networks [6.433019933439612]
We propose textitVignat, a novel attention-based framework for identifying vulnerabilities by learning graph-level semantic representations of code.
We represent codes with code property graphs (CPGs) in fine grain and use graph attention networks (GATs) for vulnerability detection.
arXiv Detail & Related papers (2023-10-30T22:31:38Z) - A Hierarchical Deep Neural Network for Detecting Lines of Codes with
Vulnerabilities [6.09170287691728]
Software vulnerabilities, caused by unintentional flaws in source codes, are the main root cause of cyberattacks.
We propose a deep learning approach to detect vulnerabilities from their LLVM IR representations based on the techniques that have been used in natural language processing.
arXiv Detail & Related papers (2022-11-15T21:21:27Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - Multi-context Attention Fusion Neural Network for Software Vulnerability
Identification [4.05739885420409]
We propose a deep learning model that learns to detect some of the common categories of security vulnerabilities in source code efficiently.
The model builds an accurate understanding of code semantics with a lot less learnable parameters.
The proposed AI achieves 98.40% F1-score on specific CWEs from the benchmarked NIST SARD dataset.
arXiv Detail & Related papers (2021-04-19T11:50:36Z) - Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data.
We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z) - Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs)
GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features.
It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.