Identifying Vulnerable Third-Party Java Libraries from Textual
Descriptions of Vulnerabilities and Libraries
- URL: http://arxiv.org/abs/2307.08206v3
- Date: Fri, 17 Nov 2023 13:49:00 GMT
- Title: Identifying Vulnerable Third-Party Java Libraries from Textual
Descriptions of Vulnerabilities and Libraries
- Authors: Tianyu Chen, Lin Li, Bingjie Shan, Guangtai Liang, Ding Li, Qianxiang
Wang, Tao Xie
- Abstract summary: VulLibMiner is first to identify vulnerable libraries from textual descriptions of both vulnerabilities and libraries.
We evaluate VulLibMiner using four state-of-the-art/practice approaches of identifying vulnerable libraries on both their dataset named VeraJava and our VulLib dataset.
- Score: 15.573551625937556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To address security vulnerabilities arising from third-party libraries,
security researchers maintain databases monitoring and curating vulnerability
reports. Application developers can identify vulnerable libraries by directly
querying the databases with their used libraries. However, the querying results
of vulnerable libraries are not reliable due to the incompleteness of
vulnerability reports. Thus, current approaches model the task of identifying
vulnerable libraries as a named-entity-recognition (NER) task or an extreme
multi-label learning (XML) task. These approaches suffer from highly inaccurate
results in identifying vulnerable libraries with complex and similar names,
e.g., Java libraries. To address these limitations, in this paper, we propose
VulLibMiner, the first to identify vulnerable libraries from textual
descriptions of both vulnerabilities and libraries, together with VulLib, a
Java vulnerability dataset with their affected libraries. VulLibMiner consists
of a TF-IDF matcher to efficiently screen out a small set of candidate
libraries and a BERT-FNN model to identify vulnerable libraries from these
candidates effectively. We evaluate VulLibMiner using four
state-of-the-art/practice approaches of identifying vulnerable libraries on
both their dataset named VeraJava and our VulLib dataset. Our evaluation
results show that VulLibMiner can effectively identify vulnerable libraries
with an average F1 score of 0.657 while the state-of-the-art/practice
approaches achieve only 0.521.
Related papers
- Discovery of Timeline and Crowd Reaction of Software Vulnerability Disclosures [47.435076500269545]
Apache Log4J was found to be vulnerable to remote code execution attacks.
More than 35,000 packages were forced to update their Log4J libraries with the latest version.
It is practically reasonable for software developers to update their third-party libraries whenever the software vendors have released a vulnerable-free version.
arXiv Detail & Related papers (2024-11-12T01:55:51Z) - Unit Test Generation for Vulnerability Exploitation in Java Third-Party Libraries [10.78078711790757]
VULEUT is designed to automatically verify the exploitability of vulnerabilities in third-party libraries commonly used in client software projects.
VULEUT first analyzes the client projects to determine the reachability of vulnerability conditions.
It then leverages the Large Language Model (LLM) to generate unit tests for vulnerability confirmation.
arXiv Detail & Related papers (2024-09-25T07:47:01Z) - HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation.
Recent studies highlight that many LLM-generated code contains serious security vulnerabilities.
We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z) - A Survey of Third-Party Library Security Research in Application Software [3.280510821619164]
With the widespread use of third-party libraries, associated security risks and potential vulnerabilities are increasingly apparent.
Malicious attackers can exploit these vulnerabilities to infiltrate systems, execute unauthorized operations, or steal sensitive information.
Research on third-party libraries in software becomes paramount to address this growing security challenge.
arXiv Detail & Related papers (2024-04-27T16:35:02Z) - Exploiting Library Vulnerability via Migration Based Automating Test
Generation [16.39796265296833]
In software development, developers extensively utilize third-party libraries to avoid implementing existing functionalities.
Vulnerability exploits, as code snippets provided for reproducing vulnerabilities after disclosure, contain a wealth of vulnerability-related information.
This study proposes a new method based on vulnerability exploits, called VESTA, which provides vulnerability exploit tests as the basis for developers to decide whether to update dependencies.
arXiv Detail & Related papers (2023-12-15T06:46:45Z) - Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs [59.596335292426105]
This paper collects the first open-source dataset to evaluate safeguards in large language models.
We train several BERT-like classifiers to achieve results comparable with GPT-4 on automatic safety evaluation.
arXiv Detail & Related papers (2023-08-25T14:02:12Z) - Vulnerability Propagation in Package Managers Used in iOS Development [2.9280059958992286]
Vulnerabilities may be found even in well-known libraries.
The library dependency network in the Swift ecosystem encompasses libraries from CocoaPods, Carthage and Swift Package Manager.
Although most libraries with publicly reported vulnerabilities are written in C, the highest impact of publicly reported vulnerabilities originated from libraries written in native iOS languages.
arXiv Detail & Related papers (2023-05-17T16:22:38Z) - CodeLMSec Benchmark: Systematically Evaluating and Finding Security
Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks.
Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities.
This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z) - CHRONOS: Time-Aware Zero-Shot Identification of Libraries from
Vulnerability Reports [12.257538059511424]
We propose a practical library identification approach, namely CHRONOS, based on zero-shot learning.
The novelty of CHRONOS is three-fold. First, CHRONOS fits into the practical pipeline by considering the chronological order of vulnerability reports.
arXiv Detail & Related papers (2023-01-10T12:57:10Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - Autosploit: A Fully Automated Framework for Evaluating the
Exploitability of Security Vulnerabilities [47.748732208602355]
Autosploit is an automated framework for evaluating the exploitability of vulnerabilities.
It automatically tests the exploits on different configurations of the environment.
It is able to identify the system properties that affect the ability to exploit a vulnerability in both noiseless and noisy environments.
arXiv Detail & Related papers (2020-06-30T18:49:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.