On the Use of Fine-grained Vulnerable Code Statements for Software
Vulnerability Assessment Models
- URL: http://arxiv.org/abs/2203.08417v1
- Date: Wed, 16 Mar 2022 06:29:40 GMT
- Title: On the Use of Fine-grained Vulnerable Code Statements for Software
Vulnerability Assessment Models
- Authors: Triet H. M. Le, M. Ali Babar
- Abstract summary: We use large-scale data from 1,782 functions of 429 SVs in 200 real-world projects to develop Machine Learning models for function-level SV assessment tasks.
We show that vulnerable statements are 5.8 times smaller in size, yet exhibit 7.5-114.5% stronger assessment performance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many studies have developed Machine Learning (ML) approaches to detect
Software Vulnerabilities (SVs) in functions and fine-grained code statements
that cause such SVs. However, there is little work on leveraging such detection
outputs for data-driven SV assessment to give information about exploitability,
impact, and severity of SVs. The information is important to understand SVs and
prioritize their fixing. Using large-scale data from 1,782 functions of 429 SVs
in 200 real-world projects, we investigate ML models for automating
function-level SV assessment tasks, i.e., predicting seven Common Vulnerability
Scoring System (CVSS) metrics. We particularly study the value and use of
vulnerable statements as inputs for developing the assessment models because
SVs in functions are originated in these statements. We show that vulnerable
statements are 5.8 times smaller in size, yet exhibit 7.5-114.5% stronger
assessment performance (Matthews Correlation Coefficient (MCC)) than
non-vulnerable statements. Incorporating context of vulnerable statements
further increases the performance by up to 8.9% (0.64 MCC and 0.75 F1-Score).
Overall, we provide the initial yet promising ML-based baselines for
function-level SV assessment, paving the way for further research in this
direction.
Related papers
- Breaking Focus: Contextual Distraction Curse in Large Language Models [68.4534308805202]
We investigate a critical vulnerability in Large Language Models (LLMs)
This phenomenon arises when models fail to maintain consistent performance on questions modified with semantically coherent but irrelevant context.
We propose an efficient tree-based search methodology to automatically generate CDV examples.
arXiv Detail & Related papers (2025-02-03T18:43:36Z) - VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models [66.56298924208319]
Vision-language generative reward models (VL-GenRMs) play a crucial role in aligning and evaluating multimodal AI systems.
Current assessment methods rely on AI-annotated preference labels from traditional tasks.
We introduce VL-RewardBench, a benchmark spanning general multimodal queries, visual hallucination detection, and complex reasoning tasks.
arXiv Detail & Related papers (2024-11-26T14:08:34Z) - Automated Code-centric Software Vulnerability Assessment: How Far Are We? An Empirical Study in C/C++ [0.716879432974126]
We conduct the first empirical study to investigate and compare the performance of Machine Learning (ML) and Deep Learning (DL) models for function-level SV assessment in C/C++.
We show that ML has matching or even better performance compared to the multi-class DL models for function-level SV assessment with significantly less training time.
arXiv Detail & Related papers (2024-07-24T07:26:58Z) - Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z) - Mitigating Data Imbalance for Software Vulnerability Assessment: Does Data Augmentation Help? [0.0]
We show that mitigating data imbalance can significantly improve the predictive performance of models for all the Common Vulnerability Scoring System (CVSS) tasks.
We also discover that simple text augmentation like combining random text insertion, deletion, and replacement can outperform the baseline across the board.
arXiv Detail & Related papers (2024-07-15T13:47:55Z) - What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases [87.65903426052155]
We perform a large-scale transfer learning experiment aimed at discovering latent vision-language skills from data.
We show that generation tasks suffer from a length bias, suggesting benchmarks should balance tasks with varying output lengths.
We present a new dataset, OLIVE, which simulates user instructions in the wild and presents challenges dissimilar to all datasets we tested.
arXiv Detail & Related papers (2024-04-03T02:40:35Z) - Are Latent Vulnerabilities Hidden Gems for Software Vulnerability
Prediction? An Empirical Study [4.830367174383139]
latent vulnerable functions can increase the number of SVs by 4x on average and correct up to 5k mislabeled functions.
Despite the noise, we show that the state-of-the-art SV prediction model can significantly benefit from such latent SVs.
arXiv Detail & Related papers (2024-01-20T03:36:01Z) - Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities [12.82645410161464]
We evaluate the effectiveness of 16 pre-trained Large Language Models on 5,000 code samples from five diverse security datasets.
Overall, LLMs show modest effectiveness in detecting vulnerabilities, obtaining an average accuracy of 62.8% and F1 score of 0.71 across datasets.
We find that advanced prompting strategies that involve step-by-step analysis significantly improve performance of LLMs on real-world datasets in terms of F1 score (by upto 0.18 on average)
arXiv Detail & Related papers (2023-11-16T13:17:20Z) - Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs [59.596335292426105]
This paper collects the first open-source dataset to evaluate safeguards in large language models.
We train several BERT-like classifiers to achieve results comparable with GPT-4 on automatic safety evaluation.
arXiv Detail & Related papers (2023-08-25T14:02:12Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - DeepCVA: Automated Commit-level Vulnerability Assessment with Deep
Multi-task Learning [0.0]
We propose a novel Deep multi-task learning model, DeepCVA, to automate seven Commit-level Vulnerability Assessment tasks simultaneously.
We conduct large-scale experiments on 1,229 vulnerability-contributing commits containing 542 different SVs in 246 real-world software projects.
DeepCVA is the best-performing model with 38% to 59.8% higher Matthews Correlation Coefficient than many supervised and unsupervised baseline models.
arXiv Detail & Related papers (2021-08-18T08:43:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.