Related papers: The Vulnerability Is in the Details: Locating Fine-grained Information of Vulnerable Code Identified by Graph-based Detectors

The Vulnerability Is in the Details: Locating Fine-grained Information of Vulnerable Code Identified by Graph-based Detectors

URL: http://arxiv.org/abs/2401.02737v3
Date: Sat, 7 Sep 2024 12:26:49 GMT
Title: The Vulnerability Is in the Details: Locating Fine-grained Information of Vulnerable Code Identified by Graph-based Detectors
Authors: Baijun Cheng, Kailong Wang, Cuiyun Gao, Xiapu Luo, Li Li, Yao Guo, Xiangqun Chen, Haoyu Wang,
Abstract summary: VULEXPLAINER is a tool for locating vulnerability-critical code lines from coarse-level vulnerable code snippets. It can flag the vulnerability-triggering code statements with an accuracy of around 90% against eight common C/C++ vulnerabilities.
Score: 33.395068754566935
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vulnerability detection is a crucial component in the software development lifecycle. Existing vulnerability detectors, especially those based on deep learning (DL) models, have achieved high effectiveness. Despite their capability of detecting vulnerable code snippets from given code fragments, the detectors are typically unable to further locate the fine-grained information pertaining to the vulnerability, such as the precise vulnerability triggering locations.In this paper, we propose VULEXPLAINER, a tool for automatically locating vulnerability-critical code lines from coarse-level vulnerable code snippets reported by DL-based detectors.Our approach takes advantage of the code structure and the semantics of the vulnerabilities. Specifically, we leverage program slicing to get a set of critical program paths containing vulnerability-triggering and vulnerability-dependent statements and rank them to pinpoint the most important one (i.e., sub-graph) as the data flow associated with the vulnerability. We demonstrate that VULEXPLAINER performs consistently well on four state-of-the-art graph-representation(GP)-based vulnerability detectors, i.e., it can flag the vulnerability-triggering code statements with an accuracy of around 90% against eight common C/C++ vulnerabilities, outperforming five widely used GNN-based explanation approaches. The experimental results demonstrate the effectiveness of VULEXPLAINER, which provides insights into a promising research line: integrating program slicing and deep learning for the interpretation of vulnerable code fragments.

Related papers

The Impact of SBOM Generators on Vulnerability Assessment in Python: A Comparison and a Novel Approach [56.4040698609393]
Software Bill of Materials (SBOM) has been promoted as a tool to increase transparency and verifiability in software composition. Current SBOM generation tools often suffer from inaccuracies in identifying components and dependencies. We propose PIP-sbom, a novel pip-inspired solution that addresses their shortcomings.
arXiv Detail & Related papers (2024-09-10T10:12:37Z)
C2P-CLIP: Injecting Category Common Prompt in CLIP to Enhance Generalization in Deepfake Detection [98.34703790782254]
We introduce Category Common Prompt CLIP, which integrates the category common prompt into the text encoder to inject category-related concepts into the image encoder. Our method achieves a 12.41% improvement in detection accuracy compared to the original CLIP, without introducing additional parameters during testing.
arXiv Detail & Related papers (2024-08-19T02:14:25Z)
Graph Neural Networks for Vulnerability Detection: A Counterfactual Explanation [41.831831628421675]
Graph Neural Networks (GNNs) have emerged as a prominent code embedding approach for vulnerability detection. We propose CFExplainer, a novel counterfactual explainer for GNN-based vulnerability detection.
arXiv Detail & Related papers (2024-04-24T06:52:53Z)
Enhancing Code Vulnerability Detection via Vulnerability-Preserving Data Augmentation [29.72520866016839]
Source code vulnerability detection aims to identify inherent vulnerabilities to safeguard software systems from potential attacks. Many prior studies overlook diverse vulnerability characteristics, simplifying the problem into a binary (0-1) classification task. FGVulDet employs multiple classifiers to discern characteristics of various vulnerability types and combines their outputs to identify the specific type of vulnerability. FGVulDet is trained on a large-scale dataset from GitHub, encompassing five different types of vulnerabilities.
arXiv Detail & Related papers (2024-04-15T09:10:52Z)
Vulnerability Detection with Code Language Models: How Far Are We? [40.455600722638906]
PrimeVul is a new dataset for training and evaluating code LMs for vulnerability detection. It incorporates a novel set of data labeling techniques that achieve comparable label accuracy to human-verified benchmarks. It also implements a rigorous data de-duplication and chronological data splitting strategy to mitigate data leakage issues.
arXiv Detail & Related papers (2024-03-27T14:34:29Z)
Toward Improved Deep Learning-based Vulnerability Detection [6.212044762686268]
Vulnerabilities in datasets have to be represented in a certain way, e.g., code lines, functions, or program slices within which the vulnerabilities exist. The detectors learn how base units can be vulnerable and then predict whether other base units are vulnerable. We have hypothesized that this focus on individual base units harms the ability of the detectors to properly detect those vulnerabilities that span multiple base units. We present our study and a framework that can be used to help DL-based detectors toward the proper inclusion of MBU vulnerabilities.
arXiv Detail & Related papers (2024-03-05T14:57:28Z)
On the Effectiveness of Function-Level Vulnerability Detectors for Inter-Procedural Vulnerabilities [28.57872406228216]
We propose a tool dubbed VulTrigger for identifying vulnerability-triggering statements across functions. Experimental results show that VulTrigger can effectively identify vulnerability-triggering statements and inter-procedural vulnerabilities. Our findings include: (i) inter-procedural vulnerabilities are prevalent with an average of 2.8 inter-procedural layers; and (ii) function-level vulnerability detectors are much less effective in detecting to-be-patched functions of inter-procedural vulnerabilities.
arXiv Detail & Related papers (2024-01-18T07:32:11Z)
Vignat: Vulnerability identification by learning code semantics via graph attention networks [6.433019933439612]
We propose textitVignat, a novel attention-based framework for identifying vulnerabilities by learning graph-level semantic representations of code. We represent codes with code property graphs (CPGs) in fine grain and use graph attention networks (GATs) for vulnerability detection.
arXiv Detail & Related papers (2023-10-30T22:31:38Z)
Can AI-Generated Text be Reliably Detected? [50.95804851595018]
Large Language Models (LLMs) perform impressively well in various applications. The potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use. We stress-test the robustness of these AI text detectors in the presence of an attacker.
arXiv Detail & Related papers (2023-03-17T17:53:19Z)
DCDetector: An IoT terminal vulnerability mining system based on distributed deep ensemble learning under source code representation [2.561778620560749]
The goal of the research is to intelligently detect vulnerabilities in source codes of high-level languages such as C/C++. This enables us to propose a code representation of sensitive sentence-related slices of source code, and to detect vulnerabilities by designing a distributed deep ensemble learning model. Experiments show that this method can reduce the false positive rate of traditional static analysis and improve the performance and accuracy of machine learning.
arXiv Detail & Related papers (2022-11-29T14:19:14Z)
Deep-Learning-based Vulnerability Detection in Binary Executables [0.0]
We present a supervised deep learning approach using recurrent neural networks for the application of vulnerability detection based on binary executables. A dataset with 50,651 samples of vulnerable code in the form of a standardized LLVM Intermediate Representation is used. A binary classification was established for detecting the presence of an arbitrary vulnerability, and a multi-class model was trained for the identification of the exact vulnerability.
arXiv Detail & Related papers (2022-11-25T10:33:33Z)
A Hierarchical Deep Neural Network for Detecting Lines of Codes with Vulnerabilities [6.09170287691728]
Software vulnerabilities, caused by unintentional flaws in source codes, are the main root cause of cyberattacks. We propose a deep learning approach to detect vulnerabilities from their LLVM IR representations based on the techniques that have been used in natural language processing.
arXiv Detail & Related papers (2022-11-15T21:21:27Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)
Mate! Are You Really Aware? An Explainability-Guided Testing Framework for Robustness of Malware Detectors [49.34155921877441]
We propose an explainability-guided and model-agnostic testing framework for robustness of malware detectors. We then use this framework to test several state-of-the-art malware detectors' abilities to detect manipulated malware. Our findings shed light on the limitations of current malware detectors, as well as how they can be improved.
arXiv Detail & Related papers (2021-11-19T08:02:38Z)
Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z)
Multi-context Attention Fusion Neural Network for Software Vulnerability Identification [4.05739885420409]
We propose a deep learning model that learns to detect some of the common categories of security vulnerabilities in source code efficiently. The model builds an accurate understanding of code semantics with a lot less learnable parameters. The proposed AI achieves 98.40% F1-score on specific CWEs from the benchmarked NIST SARD dataset.
arXiv Detail & Related papers (2021-04-19T11:50:36Z)
BiDet: An Efficient Binarized Object Detector [96.19708396510894]
We propose a binarized neural network learning method called BiDet for efficient object detection. Our BiDet fully utilizes the representational capacity of the binary neural networks for object detection by redundancy removal. Our method outperforms the state-of-the-art binary neural networks by a sizable margin.
arXiv Detail & Related papers (2020-03-09T08:16:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.