Forecasting the risk of software choices: A model to foretell security vulnerabilities from library dependencies and source code evolution
- URL: http://arxiv.org/abs/2411.11202v1
- Date: Sun, 17 Nov 2024 23:36:27 GMT
- Title: Forecasting the risk of software choices: A model to foretell security vulnerabilities from library dependencies and source code evolution
- Authors: Carlos E. Budde, Ranindya Paramitha, Fabio Massacci,
- Abstract summary: We introduce a model capable of vulnerability forecasting at library level.
Our model can estimate the probability that a software project faces a CVE disclosure in a future time window.
- Score: 4.538870924201896
- License:
- Abstract: Software security mainly studies vulnerability detection: is my code vulnerable today? This hinders risk estimation, so new approaches are emerging to forecast the occurrence of future vulnerabilities. While useful, these approaches are coarse-grained and hard to employ for project-specific technical decisions. We introduce a model capable of vulnerability forecasting at library level. Formalising source-code evolution in time together with library dependency, our model can estimate the probability that a software project faces a CVE disclosure in a future time window. Our approach is white-box and lightweight, which we demonstrate via experiments involving 1255 CVEs and 768 Java libraries, made public as an open-source artifact. Besides probabilities estimation, e.g. to plan software updates, this formal model can be used to detect security-sensitive points in a project, or measure the health of a development ecosystem.
Related papers
- Model Developmental Safety: A Safety-Centric Method and Applications in Vision-Language Models [75.8161094916476]
We study how to develop a pretrained vision-language model (aka the CLIP model) for acquiring new capabilities or improving existing capabilities of image classification.
Our experiments on improving vision perception capabilities on autonomous driving and scene recognition datasets demonstrate the efficacy of the proposed approach.
arXiv Detail & Related papers (2024-10-04T22:34:58Z) - The Impact of SBOM Generators on Vulnerability Assessment in Python: A Comparison and a Novel Approach [56.4040698609393]
Software Bill of Materials (SBOM) has been promoted as a tool to increase transparency and verifiability in software composition.
Current SBOM generation tools often suffer from inaccuracies in identifying components and dependencies.
We propose PIP-sbom, a novel pip-inspired solution that addresses their shortcomings.
arXiv Detail & Related papers (2024-09-10T10:12:37Z) - Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations [76.19419888353586]
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations.
We present our efforts to create and deploy a library of detectors: compact and easy-to-build classification models that provide labels for various harms.
arXiv Detail & Related papers (2024-03-09T21:07:16Z) - Code Ownership in Open-Source AI Software Security [18.779538756226298]
We use code ownership metrics to investigate the correlation with latent vulnerabilities across five prominent open-source AI software projects.
The findings suggest a positive relationship between high-level ownership (characterised by a limited number of minor contributors) and a decrease in vulnerabilities.
With these novel code ownership metrics, we have implemented a Python-based command-line application to aid project curators and quality assurance professionals in evaluating and benchmarking their on-site projects.
arXiv Detail & Related papers (2023-12-18T00:37:29Z) - Enhancing Large Language Models for Secure Code Generation: A
Dataset-driven Study on Vulnerability Mitigation [24.668682498171776]
Large language models (LLMs) have brought significant advancements to code generation, benefiting both novice and experienced developers.
However, their training using unsanitized data from open-source repositories, like GitHub, introduces the risk of inadvertently propagating security vulnerabilities.
This paper presents a comprehensive study focused on evaluating and enhancing code LLMs from a software security perspective.
arXiv Detail & Related papers (2023-10-25T00:32:56Z) - VULNERLIZER: Cross-analysis Between Vulnerabilities and Software
Libraries [4.2755847332268235]
VULNERLIZER is a novel framework for cross-analysis between vulnerabilities and software libraries.
It uses CVE and software library data together with clustering algorithms to generate links between vulnerabilities and libraries.
The trained model reaches a prediction accuracy of 75% or higher.
arXiv Detail & Related papers (2023-09-18T10:34:47Z) - AIBugHunter: A Practical Tool for Predicting, Classifying and Repairing
Software Vulnerabilities [27.891905729536372]
AIBugHunter is a novel ML-based software vulnerability analysis tool for C/C++ languages that is integrated into Visual Studio Code.
We propose a novel multi-objective optimization (MOO)-based vulnerability classification approach and a transformer-based estimation approach to help AIBugHunter accurately identify vulnerability types and estimate severity.
arXiv Detail & Related papers (2023-05-26T04:21:53Z) - CodeLMSec Benchmark: Systematically Evaluating and Finding Security
Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks.
Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities.
This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance.
We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems.
We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.