Enhancing Security in Third-Party Library Reuse -- Comprehensive Detection of 1-day Vulnerability through Code Patch Analysis
- URL: http://arxiv.org/abs/2411.19648v1
- Date: Fri, 29 Nov 2024 12:02:28 GMT
- Title: Enhancing Security in Third-Party Library Reuse -- Comprehensive Detection of 1-day Vulnerability through Code Patch Analysis
- Authors: Shangzhi Xu, Jialiang Dong, Weiting Cai, Juanru Li, Arash Shaghaghi, Nan Sun, Siqi Ma,
- Abstract summary: Thirdparty libraries (TPLs) can introduce vulnerabilities (known as 1-day vulnerabilities) because of the low maintenance of TPLs.
VULTURE aims at identifying 1-day vulnerabilities that arise from the reuse of vulnerable TPLs.
VULTURE successfully identified 175 vulnerabilities from 178 reused TPLs.
- Score: 8.897599530972638
- License:
- Abstract: Nowadays, software development progresses rapidly to incorporate new features. To facilitate such growth and provide convenience for developers when creating and updating software, reusing open-source software (i.e., thirdparty library reuses) has become one of the most effective and efficient methods. Unfortunately, the practice of reusing third-party libraries (TPLs) can also introduce vulnerabilities (known as 1-day vulnerabilities) because of the low maintenance of TPLs, resulting in many vulnerable versions remaining in use. If the software incorporating these TPLs fails to detect the introduced vulnerabilities and leads to delayed updates, it will exacerbate the security risks. However, the complicated code dependencies and flexibility of TPL reuses make the detection of 1-day vulnerability a challenging task. To support developers in securely reusing TPLs during software development, we design and implement VULTURE, an effective and efficient detection tool, aiming at identifying 1-day vulnerabilities that arise from the reuse of vulnerable TPLs. It first executes a database creation method, TPLFILTER, which leverages the Large Language Model (LLM) to automatically build a unique database for the targeted platform. Instead of relying on code-level similarity comparison, VULTURE employs hashing-based comparison to explore the dependencies among the collected TPLs and identify the similarities between the TPLs and the target projects. Recognizing that developers have the flexibility to reuse TPLs exactly or in a custom manner, VULTURE separately conducts version-based comparison and chunk-based analysis to capture fine-grained semantic features at the function levels. We applied VULTURE to 10 real-world projects to assess its effectiveness and efficiency in detecting 1-day vulnerabilities. VULTURE successfully identified 175 vulnerabilities from 178 reused TPLs.
Related papers
- Vulnerability-Triggering Test Case Generation from Third-Party Libraries [8.065610614825395]
VULEUT is designed to automatically verify the exploitability of vulnerabilities in third-party libraries commonly used in client software projects.
VULEUT first analyzes the client projects to determine the reachability of vulnerability conditions.
It then leverages the Large Language Model (LLM) to generate unit tests for vulnerability confirmation.
arXiv Detail & Related papers (2024-09-25T07:47:01Z) - HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation.
Recent studies highlight that many LLM-generated code contains serious security vulnerabilities.
We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z) - The Impact of SBOM Generators on Vulnerability Assessment in Python: A Comparison and a Novel Approach [56.4040698609393]
Software Bill of Materials (SBOM) has been promoted as a tool to increase transparency and verifiability in software composition.
Current SBOM generation tools often suffer from inaccuracies in identifying components and dependencies.
We propose PIP-sbom, a novel pip-inspired solution that addresses their shortcomings.
arXiv Detail & Related papers (2024-09-10T10:12:37Z) - SAFE: Advancing Large Language Models in Leveraging Semantic and Syntactic Relationships for Software Vulnerability Detection [23.7268575752712]
Software vulnerabilities (SVs) have emerged as a prevalent and critical concern for safety-critical security systems.
We propose a novel framework that enhances the capability of large language models to learn and utilize semantic and syntactic relationships from source code data for SVD.
arXiv Detail & Related papers (2024-09-02T00:49:02Z) - SCoPE: Evaluating LLMs for Software Vulnerability Detection [0.0]
This work explores and refines the CVEFixes dataset, which is commonly used to train models for code-related tasks.
The output generated by SCoPE was used to create a new version of CVEFixes.
The results show that SCoPE successfully helped to identify 905 duplicates within the evaluated subset.
arXiv Detail & Related papers (2024-07-19T15:02:00Z) - Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs)
The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation.
We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z) - Exploiting Library Vulnerability via Migration Based Automating Test
Generation [16.39796265296833]
In software development, developers extensively utilize third-party libraries to avoid implementing existing functionalities.
Vulnerability exploits, as code snippets provided for reproducing vulnerabilities after disclosure, contain a wealth of vulnerability-related information.
This study proposes a new method based on vulnerability exploits, called VESTA, which provides vulnerability exploit tests as the basis for developers to decide whether to update dependencies.
arXiv Detail & Related papers (2023-12-15T06:46:45Z) - CodeLMSec Benchmark: Systematically Evaluating and Finding Security
Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks.
Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities.
This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - RoFL: Attestable Robustness for Secure Federated Learning [59.63865074749391]
Federated Learning allows a large number of clients to train a joint model without the need to share their private data.
To ensure the confidentiality of the client updates, Federated Learning systems employ secure aggregation.
We present RoFL, a secure Federated Learning system that improves robustness against malicious clients.
arXiv Detail & Related papers (2021-07-07T15:42:49Z) - Detecting Security Fixes in Open-Source Repositories using Static Code
Analyzers [8.716427214870459]
We study the extent to which the output of off-the-shelf static code analyzers can be used as a source of features to represent commits in Machine Learning (ML) applications.
We investigate how such features can be used to construct embeddings and train ML models to automatically identify source code commits that contain vulnerability fixes.
We find that the combination of our method with commit2vec represents a tangible improvement over the state of the art in the automatic identification of commits that fix vulnerabilities.
arXiv Detail & Related papers (2021-05-07T15:57:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.