Related papers: Can An Old Fashioned Feature Extraction and A Light-weight Model Improve Vulnerability Type Identification Performance?

Can An Old Fashioned Feature Extraction and A Light-weight Model Improve Vulnerability Type Identification Performance?

URL: http://arxiv.org/abs/2306.14726v1
Date: Mon, 26 Jun 2023 14:28:51 GMT
Title: Can An Old Fashioned Feature Extraction and A Light-weight Model Improve Vulnerability Type Identification Performance?
Authors: Hieu Dinh Vo and Son Nguyen
Abstract summary: We investigate the problem of vulnerability type identification (VTI) We evaluate the performance of the well-known and advanced pre-trained models for VTI on a large set of vulnerabilities. We introduce a lightweight independent component to refine the predictions of the baseline approach.
Score: 6.423483122892239
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in automated vulnerability detection have achieved potential results in helping developers determine vulnerable components. However, after detecting vulnerabilities, investigating to fix vulnerable code is a non-trivial task. In fact, the types of vulnerability, such as buffer overflow or memory corruption, could help developers quickly understand the nature of the weaknesses and localize vulnerabilities for security analysis. In this work, we investigate the problem of vulnerability type identification (VTI). The problem is modeled as the multi-label classification task, which could be effectively addressed by "pre-training, then fine-tuning" framework with deep pre-trained embedding models. We evaluate the performance of the well-known and advanced pre-trained models for VTI on a large set of vulnerabilities. Surprisingly, their performance is not much better than that of the classical baseline approach with an old-fashioned bag-of-word, TF-IDF. Meanwhile, these deep neural network approaches cost much more resources and require GPU. We also introduce a lightweight independent component to refine the predictions of the baseline approach. Our idea is that the types of vulnerabilities could strongly correlate to certain code tokens (distinguishing tokens) in several crucial parts of programs. The distinguishing tokens for each vulnerability type are statistically identified based on their prevalence in the type versus the others. Our results show that the baseline approach enhanced by our component can outperform the state-of-the-art deep pre-trained approaches while retaining very high efficiency. Furthermore, the proposed component could also improve the neural network approaches by up to 92.8% in macro-average F1.

Related papers

Enhancing Pre-Trained Language Models for Vulnerability Detection via Semantic-Preserving Data Augmentation [4.374800396968465]
We propose a data augmentation technique aimed at enhancing the performance of pre-trained language models for vulnerability detection. By incorporating our augmented dataset in fine-tuning a series of representative code pre-trained models, up to 10.1% increase in accuracy and 23.6% increase in F1 can be achieved.
arXiv Detail & Related papers (2024-09-30T21:44:05Z)
Enhancing Code Vulnerability Detection via Vulnerability-Preserving Data Augmentation [29.72520866016839]
Source code vulnerability detection aims to identify inherent vulnerabilities to safeguard software systems from potential attacks. Many prior studies overlook diverse vulnerability characteristics, simplifying the problem into a binary (0-1) classification task. FGVulDet employs multiple classifiers to discern characteristics of various vulnerability types and combines their outputs to identify the specific type of vulnerability. FGVulDet is trained on a large-scale dataset from GitHub, encompassing five different types of vulnerabilities.
arXiv Detail & Related papers (2024-04-15T09:10:52Z)
Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework. Our importance weights are obtained by optimizing the KL-divergence regularized loss function. Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z)
LIVABLE: Exploring Long-Tailed Classification of Software Vulnerability Types [18.949810432641772]
We propose a Long-taIled software VulnerABiLity typE classification approach, called LIVABLE. LIVABLE consists of two modules, including (1) vulnerability representation learning module, which improves the propagation steps in GNN. A sequence-to-sequence model is also involved to enhance the vulnerability representations.
arXiv Detail & Related papers (2023-06-12T08:14:16Z)
Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities [19.6975205650411]
A vulnerability codebook is learned, which consists of quantized vectors representing various vulnerability patterns. During inference, the codebook is iterated to match all learned patterns and predict the presence of potential vulnerabilities. Our approach was extensively evaluated on a real-world dataset comprising more than 188,000 C/C++ functions.
arXiv Detail & Related papers (2023-05-26T04:13:31Z)
Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks. We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z)
Multi-context Attention Fusion Neural Network for Software Vulnerability Identification [4.05739885420409]
We propose a deep learning model that learns to detect some of the common categories of security vulnerabilities in source code efficiently. The model builds an accurate understanding of code semantics with a lot less learnable parameters. The proposed AI achieves 98.40% F1-score on specific CWEs from the benchmarked NIST SARD dataset.
arXiv Detail & Related papers (2021-04-19T11:50:36Z)
Increasing the Confidence of Deep Neural Networks by Coverage Analysis [71.57324258813674]
This paper presents a lightweight monitoring architecture based on coverage paradigms to enhance the model against different unsafe inputs. Experimental results show that the proposed approach is effective in detecting both powerful adversarial examples and out-of-distribution inputs.
arXiv Detail & Related papers (2021-01-28T16:38:26Z)
RobustBench: a standardized adversarial robustness benchmark [84.50044645539305]
Key challenge in benchmarking robustness is that its evaluation is often error-prone leading to robustness overestimation. We evaluate adversarial robustness with AutoAttack, an ensemble of white- and black-box attacks. We analyze the impact of robustness on the performance on distribution shifts, calibration, out-of-distribution detection, fairness, privacy leakage, smoothness, and transferability.
arXiv Detail & Related papers (2020-10-19T17:06:18Z)
Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs) GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features. It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z)
A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems. This paper proposes a self-supervised adversarial training mechanism in the input space. It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.