LIVABLE: Exploring Long-Tailed Classification of Software Vulnerability
Types
- URL: http://arxiv.org/abs/2306.06935v1
- Date: Mon, 12 Jun 2023 08:14:16 GMT
- Title: LIVABLE: Exploring Long-Tailed Classification of Software Vulnerability
Types
- Authors: Xin-Cheng Wen, Cuiyun Gao, Feng Luo, Haoyu Wang, Ge Li, and Qing Liao
- Abstract summary: We propose a Long-taIled software VulnerABiLity typE classification approach, called LIVABLE.
LIVABLE consists of two modules, including (1) vulnerability representation learning module, which improves the propagation steps in GNN.
A sequence-to-sequence model is also involved to enhance the vulnerability representations.
- Score: 18.949810432641772
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prior studies generally focus on software vulnerability detection and have
demonstrated the effectiveness of Graph Neural Network (GNN)-based approaches
for the task. Considering the various types of software vulnerabilities and the
associated different degrees of severity, it is also beneficial to determine
the type of each vulnerable code for developers. In this paper, we observe that
the distribution of vulnerability type is long-tailed in practice, where a
small portion of classes have massive samples (i.e., head classes) but the
others contain only a few samples (i.e., tail classes). Directly adopting
previous vulnerability detection approaches tends to result in poor detection
performance, mainly due to two reasons. First, it is difficult to effectively
learn the vulnerability representation due to the over-smoothing issue of GNNs.
Second, vulnerability types in tails are hard to be predicted due to the
extremely few associated samples.To alleviate these issues, we propose a
Long-taIled software VulnerABiLity typE classification approach, called
LIVABLE. LIVABLE mainly consists of two modules, including (1) vulnerability
representation learning module, which improves the propagation steps in GNN to
distinguish node representations by a differentiated propagation method. A
sequence-to-sequence model is also involved to enhance the vulnerability
representations. (2) adaptive re-weighting module, which adjusts the learning
weights for different types according to the training epochs and numbers of
associated samples by a novel training loss.
Related papers
- From Generalist to Specialist: Exploring CWE-Specific Vulnerability Detection [1.9249287163937974]
Common Weaknession (CWE) represents a unique category of vulnerabilities with distinct characteristics, code semantics, and patterns.
Treating all vulnerabilities as a single label with a binary classification approach may oversimplify the problem.
arXiv Detail & Related papers (2024-08-05T09:12:39Z) - Enhancing Code Vulnerability Detection via Vulnerability-Preserving Data Augmentation [29.72520866016839]
Source code vulnerability detection aims to identify inherent vulnerabilities to safeguard software systems from potential attacks.
Many prior studies overlook diverse vulnerability characteristics, simplifying the problem into a binary (0-1) classification task.
FGVulDet employs multiple classifiers to discern characteristics of various vulnerability types and combines their outputs to identify the specific type of vulnerability.
FGVulDet is trained on a large-scale dataset from GitHub, encompassing five different types of vulnerabilities.
arXiv Detail & Related papers (2024-04-15T09:10:52Z) - Can An Old Fashioned Feature Extraction and A Light-weight Model Improve
Vulnerability Type Identification Performance? [6.423483122892239]
We investigate the problem of vulnerability type identification (VTI)
We evaluate the performance of the well-known and advanced pre-trained models for VTI on a large set of vulnerabilities.
We introduce a lightweight independent component to refine the predictions of the baseline approach.
arXiv Detail & Related papers (2023-06-26T14:28:51Z) - Learning to Quantize Vulnerability Patterns and Match to Locate
Statement-Level Vulnerabilities [19.6975205650411]
A vulnerability codebook is learned, which consists of quantized vectors representing various vulnerability patterns.
During inference, the codebook is iterated to match all learned patterns and predict the presence of potential vulnerabilities.
Our approach was extensively evaluated on a real-world dataset comprising more than 188,000 C/C++ functions.
arXiv Detail & Related papers (2023-05-26T04:13:31Z) - An Unbiased Transformer Source Code Learning with Semantic Vulnerability
Graph [3.3598755777055374]
Current vulnerability screening techniques are ineffective at identifying novel vulnerabilities or providing developers with code vulnerability and classification.
To address these issues, we propose a joint multitasked unbiased vulnerability classifier comprising a transformer "RoBERTa" and graph convolution neural network (GCN)
We present a training process utilizing a semantic vulnerability graph (SVG) representation from source code, created by integrating edges from a sequential flow, control flow, and data flow, as well as a novel flow dubbed Poacher Flow (PF)
arXiv Detail & Related papers (2023-04-17T20:54:14Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - Adaptive Class Suppression Loss for Long-Tail Object Detection [49.7273558444966]
We devise a novel Adaptive Class Suppression Loss (ACSL) to improve the detection performance of tail categories.
Our ACSL achieves 5.18% and 5.2% improvements with ResNet50-FPN, and sets a new state of the art.
arXiv Detail & Related papers (2021-04-02T05:12:31Z) - Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs)
GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features.
It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z) - Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial
Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification.
This paper studies a complementary failure mode, invariance-based adversarial examples.
We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.