Related papers: Identifying Vulnerable Third-Party Java Libraries from Textual Descriptions of Vulnerabilities and Libraries

Identifying Vulnerable Third-Party Java Libraries from Textual Descriptions of Vulnerabilities and Libraries

URL: http://arxiv.org/abs/2307.08206v3
Date: Fri, 17 Nov 2023 13:49:00 GMT
Title: Identifying Vulnerable Third-Party Java Libraries from Textual Descriptions of Vulnerabilities and Libraries
Authors: Tianyu Chen, Lin Li, Bingjie Shan, Guangtai Liang, Ding Li, Qianxiang Wang, Tao Xie
Abstract summary: VulLibMiner is first to identify vulnerable libraries from textual descriptions of both vulnerabilities and libraries. We evaluate VulLibMiner using four state-of-the-art/practice approaches of identifying vulnerable libraries on both their dataset named VeraJava and our VulLib dataset.
Score: 15.573551625937556
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To address security vulnerabilities arising from third-party libraries, security researchers maintain databases monitoring and curating vulnerability reports. Application developers can identify vulnerable libraries by directly querying the databases with their used libraries. However, the querying results of vulnerable libraries are not reliable due to the incompleteness of vulnerability reports. Thus, current approaches model the task of identifying vulnerable libraries as a named-entity-recognition (NER) task or an extreme multi-label learning (XML) task. These approaches suffer from highly inaccurate results in identifying vulnerable libraries with complex and similar names, e.g., Java libraries. To address these limitations, in this paper, we propose VulLibMiner, the first to identify vulnerable libraries from textual descriptions of both vulnerabilities and libraries, together with VulLib, a Java vulnerability dataset with their affected libraries. VulLibMiner consists of a TF-IDF matcher to efficiently screen out a small set of candidate libraries and a BERT-FNN model to identify vulnerable libraries from these candidates effectively. We evaluate VulLibMiner using four state-of-the-art/practice approaches of identifying vulnerable libraries on both their dataset named VeraJava and our VulLib dataset. Our evaluation results show that VulLibMiner can effectively identify vulnerable libraries with an average F1 score of 0.657 while the state-of-the-art/practice approaches achieve only 0.521.

Related papers

SAVANT: Vulnerability Detection in Application Dependencies through Semantic-Guided Reachability Analysis [6.989158266868967]
integration of open-source third-party library dependencies in Java development introduces significant security risks.<n>Savant combines semantic preprocessing with LLM-powered context analysis for accurate vulnerability detection.<n>Savant achieves 83.8% precision, 73.8% recall, 69.0% accuracy, and 78.5% F1-score, outperforming state-of-the-art SCA tools.
arXiv Detail & Related papers (2025-06-21T19:48:13Z)
LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries [11.331334831883058]
Open-source AI libraries are foundational to modern AI systems, yet they present significant, underexamined risks spanning security, licensing, maintenance, supply chain integrity, and regulatory compliance.<n>We introduce LibVulnWatch, a system that leverages recent advances in large language models and agentic to perform deep, evidence-based evaluations of these libraries.
arXiv Detail & Related papers (2025-05-13T12:58:11Z)
Generating Mitigations for Downstream Projects to Neutralize Upstream Library Vulnerability [8.673798395456185]
Third-party libraries are essential in software development as they prevent the need for developers to recreate existing functionalities. upgrading dependencies to secure versions is not feasible to neutralize vulnerabilities without patches or in projects with specific version requirements. Both the state-of-the-art automatic vulnerability repair and automatic program repair methods fail to address this issue.
arXiv Detail & Related papers (2025-03-31T16:20:29Z)
The Seeds of the FUTURE Sprout from History: Fuzzing for Unveiling Vulnerabilities in Prospective Deep-Learning Libraries [14.260990784121423]
Future is the first universal fuzzing framework tailored for newly introduced and prospective DL libraries. It uses historical bug information from existing libraries and fine-tunes LLMs for specialized code generation. It significantly outperforms existing fuzzers in bug detection, success rate of bug reproduction, validity rate of code generation, and API coverage.
arXiv Detail & Related papers (2024-12-02T09:33:28Z)
Discovery of Timeline and Crowd Reaction of Software Vulnerability Disclosures [47.435076500269545]
Apache Log4J was found to be vulnerable to remote code execution attacks. More than 35,000 packages were forced to update their Log4J libraries with the latest version. It is practically reasonable for software developers to update their third-party libraries whenever the software vendors have released a vulnerable-free version.
arXiv Detail & Related papers (2024-11-12T01:55:51Z)
Unit Test Generation for Vulnerability Exploitation in Java Third-Party Libraries [10.78078711790757]
VULEUT is designed to automatically verify the exploitability of vulnerabilities in third-party libraries commonly used in client software projects. VULEUT first analyzes the client projects to determine the reachability of vulnerability conditions. It then leverages the Large Language Model (LLM) to generate unit tests for vulnerability confirmation.
arXiv Detail & Related papers (2024-09-25T07:47:01Z)
HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation. Recent studies highlight that many LLM-generated code contains serious security vulnerabilities. We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z)
A Survey of Third-Party Library Security Research in Application Software [3.280510821619164]
With the widespread use of third-party libraries, associated security risks and potential vulnerabilities are increasingly apparent. Malicious attackers can exploit these vulnerabilities to infiltrate systems, execute unauthorized operations, or steal sensitive information. Research on third-party libraries in software becomes paramount to address this growing security challenge.
arXiv Detail & Related papers (2024-04-27T16:35:02Z)
Exploiting Library Vulnerability via Migration Based Automating Test Generation [16.39796265296833]
In software development, developers extensively utilize third-party libraries to avoid implementing existing functionalities. Vulnerability exploits, as code snippets provided for reproducing vulnerabilities after disclosure, contain a wealth of vulnerability-related information. This study proposes a new method based on vulnerability exploits, called VESTA, which provides vulnerability exploit tests as the basis for developers to decide whether to update dependencies.
arXiv Detail & Related papers (2023-12-15T06:46:45Z)
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs [59.596335292426105]
This paper collects the first open-source dataset to evaluate safeguards in large language models. We train several BERT-like classifiers to achieve results comparable with GPT-4 on automatic safety evaluation.
arXiv Detail & Related papers (2023-08-25T14:02:12Z)
Vulnerability Propagation in Package Managers Used in iOS Development [2.9280059958992286]
Vulnerabilities may be found even in well-known libraries. The library dependency network in the Swift ecosystem encompasses libraries from CocoaPods, Carthage and Swift Package Manager. Although most libraries with publicly reported vulnerabilities are written in C, the highest impact of publicly reported vulnerabilities originated from libraries written in native iOS languages.
arXiv Detail & Related papers (2023-05-17T16:22:38Z)
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)
CHRONOS: Time-Aware Zero-Shot Identification of Libraries from Vulnerability Reports [12.257538059511424]
We propose a practical library identification approach, namely CHRONOS, based on zero-shot learning. The novelty of CHRONOS is three-fold. First, CHRONOS fits into the practical pipeline by considering the chronological order of vulnerability reports.
arXiv Detail & Related papers (2023-01-10T12:57:10Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)
Autosploit: A Fully Automated Framework for Evaluating the Exploitability of Security Vulnerabilities [47.748732208602355]
Autosploit is an automated framework for evaluating the exploitability of vulnerabilities. It automatically tests the exploits on different configurations of the environment. It is able to identify the system properties that affect the ability to exploit a vulnerability in both noiseless and noisy environments.
arXiv Detail & Related papers (2020-06-30T18:49:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.