ARVO: Atlas of Reproducible Vulnerabilities for Open Source Software
- URL: http://arxiv.org/abs/2408.02153v1
- Date: Sun, 4 Aug 2024 22:13:14 GMT
- Title: ARVO: Atlas of Reproducible Vulnerabilities for Open Source Software
- Authors: Xiang Mei, Pulkit Singh Singaria, Jordi Del Castillo, Haoran Xi, Abdelouahab, Benchikh, Tiffany Bao, Ruoyu Wang, Yan Shoshitaishvili, Adam Doupé, Hammond Pearce, Brendan Dolan-Gavitt,
- Abstract summary: We introduce ARVO: an Atlas of Reproducible Vulnerabilities in Open-source software.
We reproduce more than 5,000 memory vulnerabilities across over 250 projects.
Our dataset can be automatically updated as OSS-Fuzz finds new vulnerabilities.
- Score: 20.927909014593318
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-quality datasets of real-world vulnerabilities are enormously valuable for downstream research in software security, but existing datasets are typically small, require extensive manual effort to update, and are missing crucial features that such research needs. In this paper, we introduce ARVO: an Atlas of Reproducible Vulnerabilities in Open-source software. By sourcing vulnerabilities from C/C++ projects that Google's OSS-Fuzz discovered and implementing a reliable re-compilation system, we successfully reproduce more than 5,000 memory vulnerabilities across over 250 projects, each with a triggering input, the canonical developer-written patch for fixing the vulnerability, and the ability to automatically rebuild the project from source and run it at its vulnerable and patched revisions. Moreover, our dataset can be automatically updated as OSS-Fuzz finds new vulnerabilities, allowing it to grow over time. We provide a thorough characterization of the ARVO dataset, show that it can locate fixes more accurately than Google's own OSV reproduction effort, and demonstrate its value for future research through two case studies: firstly evaluating real-world LLM-based vulnerability repair, and secondly identifying over 300 falsely patched (still-active) zero-day vulnerabilities from projects improperly labeled by OSS-Fuzz.
Related papers
- Trustworthiness in Retrieval-Augmented Generation Systems: A Survey [59.26328612791924]
Retrieval-Augmented Generation (RAG) has quickly grown into a pivotal paradigm in the development of Large Language Models (LLMs)
We propose a unified framework that assesses the trustworthiness of RAG systems across six key dimensions: factuality, robustness, fairness, transparency, accountability, and privacy.
arXiv Detail & Related papers (2024-09-16T09:06:44Z) - "Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models [74.05368440735468]
Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs)
In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases.
arXiv Detail & Related papers (2024-06-26T05:36:23Z) - VulZoo: A Comprehensive Vulnerability Intelligence Dataset [12.229092589037808]
VulZoo is a comprehensive vulnerability intelligence dataset that covers 17 popular vulnerability information sources.
We make VulZoo publicly available and maintain it with incremental updates to facilitate future research.
arXiv Detail & Related papers (2024-06-24T06:39:07Z) - On Security Weaknesses and Vulnerabilities in Deep Learning Systems [32.14068820256729]
We specifically look into deep learning (DL) framework and perform the first systematic study of vulnerabilities in DL systems.
We propose a two-stream data analysis framework to explore vulnerability patterns from various databases.
We conducted a large-scale empirical study of 3,049 DL vulnerabilities to better understand the patterns of vulnerability and the challenges in fixing them.
arXiv Detail & Related papers (2024-06-12T23:04:13Z) - Exploiting Library Vulnerability via Migration Based Automating Test
Generation [16.39796265296833]
In software development, developers extensively utilize third-party libraries to avoid implementing existing functionalities.
Vulnerability exploits, as code snippets provided for reproducing vulnerabilities after disclosure, contain a wealth of vulnerability-related information.
This study proposes a new method based on vulnerability exploits, called VESTA, which provides vulnerability exploit tests as the basis for developers to decide whether to update dependencies.
arXiv Detail & Related papers (2023-12-15T06:46:45Z) - DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based
Vulnerability Detection [29.52887618905746]
This dataset contains 18,945 vulnerable functions spanning 150 CWEs and 330,492 non-vulnerable functions extracted from 7,514 commits.
Our results show that deep learning is still not ready for vulnerability detection, due to high false positive rate, low F1 score, and difficulty of detecting hard CWEs.
We demonstrate that large language models (LLMs) are a promising research direction for ML-based vulnerability detection, outperforming Graph Neural Networks (GNNs) with code-structure features.
arXiv Detail & Related papers (2023-04-01T23:29:14Z) - The Dark Side of AutoML: Towards Architectural Backdoor Search [49.16544351888333]
EVAS is a new attack that leverages NAS to find neural architectures with inherent backdoors and exploits such vulnerability using input-aware triggers.
EVAS features high evasiveness, transferability, and robustness, thereby expanding the adversary's design spectrum.
This work raises concerns about the current practice of NAS and points to potential directions to develop effective countermeasures.
arXiv Detail & Related papers (2022-10-21T18:13:23Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - FRUIT: Faithfully Reflecting Updated Information in Text [106.40177769765512]
We introduce the novel generation task of *faithfully reflecting updated information in text*(FRUIT)
Our analysis shows that developing models that can update articles faithfully requires new capabilities for neural generation models.
arXiv Detail & Related papers (2021-12-16T05:21:24Z) - CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from
Open-Source Software [0.0]
We implement a fully automated dataset collection tool and share an initial release of the resulting vulnerability dataset named CVEfixes.
The dataset is enriched with meta-data such as programming language, and detailed code and security metrics at five levels of abstraction.
CVEfixes supports various types of data-driven software security research, such as vulnerability prediction, vulnerability classification, vulnerability severity prediction, analysis of vulnerability-related code changes, and automated vulnerability repair.
arXiv Detail & Related papers (2021-07-19T11:34:09Z) - Autosploit: A Fully Automated Framework for Evaluating the
Exploitability of Security Vulnerabilities [47.748732208602355]
Autosploit is an automated framework for evaluating the exploitability of vulnerabilities.
It automatically tests the exploits on different configurations of the environment.
It is able to identify the system properties that affect the ability to exploit a vulnerability in both noiseless and noisy environments.
arXiv Detail & Related papers (2020-06-30T18:49:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.