Related papers: Forecasting the risk of software choices: A model to foretell security vulnerabilities from library dependencies and source code evolution

Forecasting the risk of software choices: A model to foretell security vulnerabilities from library dependencies and source code evolution

URL: http://arxiv.org/abs/2411.11202v1
Date: Sun, 17 Nov 2024 23:36:27 GMT
Title: Forecasting the risk of software choices: A model to foretell security vulnerabilities from library dependencies and source code evolution
Authors: Carlos E. Budde, Ranindya Paramitha, Fabio Massacci,
Abstract summary: We introduce a model capable of vulnerability forecasting at library level. Our model can estimate the probability that a software project faces a CVE disclosure in a future time window.
Score: 4.538870924201896
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Software security mainly studies vulnerability detection: is my code vulnerable today? This hinders risk estimation, so new approaches are emerging to forecast the occurrence of future vulnerabilities. While useful, these approaches are coarse-grained and hard to employ for project-specific technical decisions. We introduce a model capable of vulnerability forecasting at library level. Formalising source-code evolution in time together with library dependency, our model can estimate the probability that a software project faces a CVE disclosure in a future time window. Our approach is white-box and lightweight, which we demonstrate via experiments involving 1255 CVEs and 768 Java libraries, made public as an open-source artifact. Besides probabilities estimation, e.g. to plan software updates, this formal model can be used to detect security-sensitive points in a project, or measure the health of a development ecosystem.

Related papers

Wolves in the Repository: A Software Engineering Analysis of the XZ Utils Supply Chain Attack [0.8517406772939294]
The digital economy runs on Open Source Software (OSS), with an estimated 90% of modern applications containing open-source components. This paper examines a sophisticated attack on the XZUtils project (-2024-3094), where attackers exploited not just code, but the entire open-source development process. Our analysis reveals a new breed of supply chain attack that manipulates software engineering practices themselves.
arXiv Detail & Related papers (2025-04-24T12:06:11Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models. Our framework incorporates two complementary strategies: internal TTC and external TTC. We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
Uncertainty-Aware Decoding with Minimum Bayes Risk [70.6645260214115]
We show how Minimum Bayes Risk decoding, which selects model generations according to an expected risk, can be generalized into a principled uncertainty-aware decoding method. We show that this modified expected risk is useful for both choosing outputs and deciding when to abstain from generation and can provide improvements without incurring overhead.
arXiv Detail & Related papers (2025-03-07T10:55:12Z)
Enhanced LLM-Based Framework for Predicting Null Pointer Dereference in Source Code [2.2020053359163305]
We propose a novel approach using a fine-tuned Large Language Model (LLM) termed "DeLLNeuN" Our model showed 87% accuracy with 88% precision using the Draper VDISC dataset.
arXiv Detail & Related papers (2024-11-29T19:24:08Z)
Model Developmental Safety: A Safety-Centric Method and Applications in Vision-Language Models [75.8161094916476]
We study how to develop a pretrained vision-language model (aka the CLIP model) for acquiring new capabilities or improving existing capabilities of image classification. Our experiments on improving vision perception capabilities on autonomous driving and scene recognition datasets demonstrate the efficacy of the proposed approach.
arXiv Detail & Related papers (2024-10-04T22:34:58Z)
The Impact of SBOM Generators on Vulnerability Assessment in Python: A Comparison and a Novel Approach [56.4040698609393]
Software Bill of Materials (SBOM) has been promoted as a tool to increase transparency and verifiability in software composition. Current SBOM generation tools often suffer from inaccuracies in identifying components and dependencies. We propose PIP-sbom, a novel pip-inspired solution that addresses their shortcomings.
arXiv Detail & Related papers (2024-09-10T10:12:37Z)
Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations [76.19419888353586]
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. We present our efforts to create and deploy a library of detectors: compact and easy-to-build classification models that provide labels for various harms.
arXiv Detail & Related papers (2024-03-09T21:07:16Z)
Code Ownership in Open-Source AI Software Security [18.779538756226298]
We use code ownership metrics to investigate the correlation with latent vulnerabilities across five prominent open-source AI software projects. The findings suggest a positive relationship between high-level ownership (characterised by a limited number of minor contributors) and a decrease in vulnerabilities. With these novel code ownership metrics, we have implemented a Python-based command-line application to aid project curators and quality assurance professionals in evaluating and benchmarking their on-site projects.
arXiv Detail & Related papers (2023-12-18T00:37:29Z)
Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation [24.668682498171776]
Large language models (LLMs) have brought significant advancements to code generation, benefiting both novice and experienced developers. However, their training using unsanitized data from open-source repositories, like GitHub, introduces the risk of inadvertently propagating security vulnerabilities. This paper presents a comprehensive study focused on evaluating and enhancing code LLMs from a software security perspective.
arXiv Detail & Related papers (2023-10-25T00:32:56Z)
VULNERLIZER: Cross-analysis Between Vulnerabilities and Software Libraries [4.2755847332268235]
VULNERLIZER is a novel framework for cross-analysis between vulnerabilities and software libraries. It uses CVE and software library data together with clustering algorithms to generate links between vulnerabilities and libraries. The trained model reaches a prediction accuracy of 75% or higher.
arXiv Detail & Related papers (2023-09-18T10:34:47Z)
AIBugHunter: A Practical Tool for Predicting, Classifying and Repairing Software Vulnerabilities [27.891905729536372]
AIBugHunter is a novel ML-based software vulnerability analysis tool for C/C++ languages that is integrated into Visual Studio Code. We propose a novel multi-objective optimization (MOO)-based vulnerability classification approach and a transformer-based estimation approach to help AIBugHunter accurately identify vulnerability types and estimate severity.
arXiv Detail & Related papers (2023-05-26T04:21:53Z)
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)
Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance. We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems. We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.