LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries
- URL: http://arxiv.org/abs/2505.08842v2
- Date: Mon, 30 Jun 2025 16:31:53 GMT
- Title: LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries
- Authors: Zekun Wu, Seonglae Cho, Umar Mohammed, Cristian Munoz, Kleyton Costa, Xin Guan, Theo King, Ze Wang, Emre Kazim, Adriano Koshiyama,
- Abstract summary: Open-source AI libraries are foundational to modern AI systems, yet they present significant, underexamined risks spanning security, licensing, maintenance, supply chain integrity, and regulatory compliance.<n>We introduce LibVulnWatch, a system that leverages recent advances in large language models and agentic to perform deep, evidence-based evaluations of these libraries.
- Score: 11.331334831883058
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Open-source AI libraries are foundational to modern AI systems, yet they present significant, underexamined risks spanning security, licensing, maintenance, supply chain integrity, and regulatory compliance. We introduce LibVulnWatch, a system that leverages recent advances in large language models and agentic workflows to perform deep, evidence-based evaluations of these libraries. Built on a graph-based orchestration of specialized agents, the framework extracts, verifies, and quantifies risk using information from repositories, documentation, and vulnerability databases. LibVulnWatch produces reproducible, governance-aligned scores across five critical domains, publishing results to a public leaderboard for ongoing ecosystem monitoring. Applied to 20 widely used libraries, including ML frameworks, LLM inference engines, and agent orchestration tools, our approach covers up to 88% of OpenSSF Scorecard checks while surfacing up to 19 additional risks per library, such as critical RCE vulnerabilities, missing SBOMs, and regulatory gaps. By integrating advanced language technologies with the practical demands of software risk assessment, this work demonstrates a scalable, transparent mechanism for continuous supply chain evaluation and informed library selection.
Related papers
- CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale [46.76144797837242]
Large language model (LLM) agents are becoming increasingly skilled at handling cybersecurity tasks autonomously.<n>Existing benchmarks fall short, often failing to capture real-world scenarios or being limited in scope.<n>We introduce CyberGym, a large-scale and high-quality cybersecurity evaluation framework featuring 1,507 real-world vulnerabilities.
arXiv Detail & Related papers (2025-06-03T07:35:14Z) - A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron? [30.063392019347887]
We present a systematization of knowledge on the safety and security threats of emphComputer-Using Agents.<n> CUAs are capable of autonomously performing tasks such as navigating desktop applications, web pages, and mobile apps.
arXiv Detail & Related papers (2025-05-16T06:56:42Z) - Discovery of Timeline and Crowd Reaction of Software Vulnerability Disclosures [47.435076500269545]
Apache Log4J was found to be vulnerable to remote code execution attacks.
More than 35,000 packages were forced to update their Log4J libraries with the latest version.
It is practically reasonable for software developers to update their third-party libraries whenever the software vendors have released a vulnerable-free version.
arXiv Detail & Related papers (2024-11-12T01:55:51Z) - The Impact of SBOM Generators on Vulnerability Assessment in Python: A Comparison and a Novel Approach [56.4040698609393]
Software Bill of Materials (SBOM) has been promoted as a tool to increase transparency and verifiability in software composition.
Current SBOM generation tools often suffer from inaccuracies in identifying components and dependencies.
We propose PIP-sbom, a novel pip-inspired solution that addresses their shortcomings.
arXiv Detail & Related papers (2024-09-10T10:12:37Z) - On Security Weaknesses and Vulnerabilities in Deep Learning Systems [32.14068820256729]
We specifically look into deep learning (DL) framework and perform the first systematic study of vulnerabilities in DL systems.
We propose a two-stream data analysis framework to explore vulnerability patterns from various databases.
We conducted a large-scale empirical study of 3,049 DL vulnerabilities to better understand the patterns of vulnerability and the challenges in fixing them.
arXiv Detail & Related papers (2024-06-12T23:04:13Z) - Securing the Open RAN Infrastructure: Exploring Vulnerabilities in Kubernetes Deployments [60.51751612363882]
We investigate the security implications of and software-based Open Radio Access Network (RAN) systems.
We highlight the presence of potential vulnerabilities and misconfigurations in the infrastructure supporting the Near Real-Time RAN Controller (RIC) cluster.
arXiv Detail & Related papers (2024-05-03T07:18:45Z) - A Survey of Third-Party Library Security Research in Application Software [3.280510821619164]
With the widespread use of third-party libraries, associated security risks and potential vulnerabilities are increasingly apparent.
Malicious attackers can exploit these vulnerabilities to infiltrate systems, execute unauthorized operations, or steal sensitive information.
Research on third-party libraries in software becomes paramount to address this growing security challenge.
arXiv Detail & Related papers (2024-04-27T16:35:02Z) - One for All and All for One: GNN-based Control-Flow Attestation for
Embedded Devices [16.425360892610986]
Control-Flow (CFA) is a security service that allows an entity (verifier) to verify the integrity of code execution on a remote computer system.
Existing CFA schemes suffer from impractical assumptions, such as requiring access to the prover's internal state.
We introduce RAGE, a novel, lightweight CFA approach with minimal requirements.
arXiv Detail & Related papers (2024-03-12T10:00:06Z) - A Landscape Study of Open Source and Proprietary Tools for Software Bill
of Materials (SBOM) [3.1190983209295076]
Software Bill of Materials (SBOM) is a repository that inventories all third-party components and dependencies used in an application.
Recent supply chain breaches underscore the urgent need to enhance software security and vulnerability risks.
This research paper conducts an empirical analysis to assess the current landscape of open-source and proprietary tools related to SBOM.
arXiv Detail & Related papers (2024-02-17T00:36:20Z) - VULNERLIZER: Cross-analysis Between Vulnerabilities and Software
Libraries [4.2755847332268235]
VULNERLIZER is a novel framework for cross-analysis between vulnerabilities and software libraries.
It uses CVE and software library data together with clustering algorithms to generate links between vulnerabilities and libraries.
The trained model reaches a prediction accuracy of 75% or higher.
arXiv Detail & Related papers (2023-09-18T10:34:47Z) - Identifying Vulnerable Third-Party Java Libraries from Textual
Descriptions of Vulnerabilities and Libraries [15.573551625937556]
VulLibMiner is first to identify vulnerable libraries from textual descriptions of both vulnerabilities and libraries.
We evaluate VulLibMiner using four state-of-the-art/practice approaches of identifying vulnerable libraries on both their dataset named VeraJava and our VulLib dataset.
arXiv Detail & Related papers (2023-07-17T02:54:07Z) - Analyzing Maintenance Activities of Software Libraries [55.2480439325792]
Industrial applications heavily integrate open-source software libraries nowadays.<n>I want to introduce an automatic monitoring approach for industrial applications to identify open-source dependencies that show negative signs regarding their current or future maintenance activities.
arXiv Detail & Related papers (2023-06-09T16:51:25Z) - On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository.
We retrieve over 53k potential vulnerable clones from Maven Central.
We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - Detecting Security Fixes in Open-Source Repositories using Static Code
Analyzers [8.716427214870459]
We study the extent to which the output of off-the-shelf static code analyzers can be used as a source of features to represent commits in Machine Learning (ML) applications.
We investigate how such features can be used to construct embeddings and train ML models to automatically identify source code commits that contain vulnerability fixes.
We find that the combination of our method with commit2vec represents a tangible improvement over the state of the art in the automatic identification of commits that fix vulnerabilities.
arXiv Detail & Related papers (2021-05-07T15:57:17Z) - Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance.
We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems.
We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.