LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries
- URL: http://arxiv.org/abs/2505.08842v2
- Date: Mon, 30 Jun 2025 16:31:53 GMT
- Title: LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries
- Authors: Zekun Wu, Seonglae Cho, Umar Mohammed, Cristian Munoz, Kleyton Costa, Xin Guan, Theo King, Ze Wang, Emre Kazim, Adriano Koshiyama,
- Abstract summary: Open-source AI libraries are foundational to modern AI systems, yet they present significant, underexamined risks spanning security, licensing, maintenance, supply chain integrity, and regulatory compliance.<n>We introduce LibVulnWatch, a system that leverages recent advances in large language models and agentic to perform deep, evidence-based evaluations of these libraries.
- Score: 11.331334831883058
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Open-source AI libraries are foundational to modern AI systems, yet they present significant, underexamined risks spanning security, licensing, maintenance, supply chain integrity, and regulatory compliance. We introduce LibVulnWatch, a system that leverages recent advances in large language models and agentic workflows to perform deep, evidence-based evaluations of these libraries. Built on a graph-based orchestration of specialized agents, the framework extracts, verifies, and quantifies risk using information from repositories, documentation, and vulnerability databases. LibVulnWatch produces reproducible, governance-aligned scores across five critical domains, publishing results to a public leaderboard for ongoing ecosystem monitoring. Applied to 20 widely used libraries, including ML frameworks, LLM inference engines, and agent orchestration tools, our approach covers up to 88% of OpenSSF Scorecard checks while surfacing up to 19 additional risks per library, such as critical RCE vulnerabilities, missing SBOMs, and regulatory gaps. By integrating advanced language technologies with the practical demands of software risk assessment, this work demonstrates a scalable, transparent mechanism for continuous supply chain evaluation and informed library selection.
Related papers
- RedSage: A Cybersecurity Generalist LLM [45.91667919408369]
RedSage is an open-source, locally deployable cybersecurity assistant with domain-aware pretraining and post-training.<n>We use a large-scale web filtering and manual collection of high-quality resources, spanning 28.6K documents across frameworks, offensive techniques, and security tools.<n>RedSage is evaluated on established cybersecurity benchmarks (e.g., CTI-Bench, CyberMetric, SECURE) and general LLM benchmarks to assess broader generalization.
arXiv Detail & Related papers (2026-01-29T18:59:57Z) - Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs [65.6660735371212]
We present textbftextscJustAsk, a framework that autonomously discovers effective extraction strategies through interaction alone.<n>It formulates extraction as an online exploration problem, using Upper Confidence Bound--based strategy selection and a hierarchical skill space spanning atomic probes and high-level orchestration.<n>Our results expose system prompts as a critical yet largely unprotected attack surface in modern agent systems.
arXiv Detail & Related papers (2026-01-29T03:53:25Z) - Towards Verifiably Safe Tool Use for LLM Agents [53.55621104327779]
Large language model (LLM)-based AI agents extend capabilities by enabling access to tools such as data sources, APIs, search engines, code sandboxes, and even other agents.<n>LLMs may invoke unintended tool interactions and introduce risks, such as leaking sensitive data or overwriting critical records.<n>Current approaches to mitigate these risks, such as model-based safeguards, enhance agents' reliability but cannot guarantee system safety.
arXiv Detail & Related papers (2026-01-12T21:31:38Z) - Automated Vulnerability Validation and Verification: A Large Language Model Approach [7.482522010482827]
This paper introduces an end-to-end multi-step pipeline leveraging generative AI, specifically large language models (LLMs)<n>Our approach extracts information from CVE disclosures in the National Vulnerability Database.<n>It augments it with external public knowledge (e.g., threat advisories, code snippets) using Retrieval-Augmented Generation (RAG)<n>The pipeline iteratively refines generated artifacts, validates attack success with test cases, and supports complex multi-container setups.
arXiv Detail & Related papers (2025-09-28T19:16:12Z) - CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale [46.76144797837242]
Large language model (LLM) agents are becoming increasingly skilled at handling cybersecurity tasks autonomously.<n>Existing benchmarks fall short, often failing to capture real-world scenarios or being limited in scope.<n>We introduce CyberGym, a large-scale and high-quality cybersecurity evaluation framework featuring 1,507 real-world vulnerabilities.
arXiv Detail & Related papers (2025-06-03T07:35:14Z) - A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron? [30.063392019347887]
We present a systematization of knowledge on the safety and security threats of emphComputer-Using Agents.<n> CUAs are capable of autonomously performing tasks such as navigating desktop applications, web pages, and mobile apps.
arXiv Detail & Related papers (2025-05-16T06:56:42Z) - Discovery of Timeline and Crowd Reaction of Software Vulnerability Disclosures [47.435076500269545]
Apache Log4J was found to be vulnerable to remote code execution attacks.
More than 35,000 packages were forced to update their Log4J libraries with the latest version.
It is practically reasonable for software developers to update their third-party libraries whenever the software vendors have released a vulnerable-free version.
arXiv Detail & Related papers (2024-11-12T01:55:51Z) - The Impact of SBOM Generators on Vulnerability Assessment in Python: A Comparison and a Novel Approach [56.4040698609393]
Software Bill of Materials (SBOM) has been promoted as a tool to increase transparency and verifiability in software composition.
Current SBOM generation tools often suffer from inaccuracies in identifying components and dependencies.
We propose PIP-sbom, a novel pip-inspired solution that addresses their shortcomings.
arXiv Detail & Related papers (2024-09-10T10:12:37Z) - On Security Weaknesses and Vulnerabilities in Deep Learning Systems [32.14068820256729]
We specifically look into deep learning (DL) framework and perform the first systematic study of vulnerabilities in DL systems.
We propose a two-stream data analysis framework to explore vulnerability patterns from various databases.
We conducted a large-scale empirical study of 3,049 DL vulnerabilities to better understand the patterns of vulnerability and the challenges in fixing them.
arXiv Detail & Related papers (2024-06-12T23:04:13Z) - Securing the Open RAN Infrastructure: Exploring Vulnerabilities in Kubernetes Deployments [60.51751612363882]
We investigate the security implications of and software-based Open Radio Access Network (RAN) systems.
We highlight the presence of potential vulnerabilities and misconfigurations in the infrastructure supporting the Near Real-Time RAN Controller (RIC) cluster.
arXiv Detail & Related papers (2024-05-03T07:18:45Z) - A Survey of Third-Party Library Security Research in Application Software [3.280510821619164]
With the widespread use of third-party libraries, associated security risks and potential vulnerabilities are increasingly apparent.
Malicious attackers can exploit these vulnerabilities to infiltrate systems, execute unauthorized operations, or steal sensitive information.
Research on third-party libraries in software becomes paramount to address this growing security challenge.
arXiv Detail & Related papers (2024-04-27T16:35:02Z) - One for All and All for One: GNN-based Control-Flow Attestation for
Embedded Devices [16.425360892610986]
Control-Flow (CFA) is a security service that allows an entity (verifier) to verify the integrity of code execution on a remote computer system.
Existing CFA schemes suffer from impractical assumptions, such as requiring access to the prover's internal state.
We introduce RAGE, a novel, lightweight CFA approach with minimal requirements.
arXiv Detail & Related papers (2024-03-12T10:00:06Z) - A Landscape Study of Open Source and Proprietary Tools for Software Bill
of Materials (SBOM) [3.1190983209295076]
Software Bill of Materials (SBOM) is a repository that inventories all third-party components and dependencies used in an application.
Recent supply chain breaches underscore the urgent need to enhance software security and vulnerability risks.
This research paper conducts an empirical analysis to assess the current landscape of open-source and proprietary tools related to SBOM.
arXiv Detail & Related papers (2024-02-17T00:36:20Z) - VULNERLIZER: Cross-analysis Between Vulnerabilities and Software
Libraries [4.2755847332268235]
VULNERLIZER is a novel framework for cross-analysis between vulnerabilities and software libraries.
It uses CVE and software library data together with clustering algorithms to generate links between vulnerabilities and libraries.
The trained model reaches a prediction accuracy of 75% or higher.
arXiv Detail & Related papers (2023-09-18T10:34:47Z) - Identifying Vulnerable Third-Party Java Libraries from Textual
Descriptions of Vulnerabilities and Libraries [15.573551625937556]
VulLibMiner is first to identify vulnerable libraries from textual descriptions of both vulnerabilities and libraries.
We evaluate VulLibMiner using four state-of-the-art/practice approaches of identifying vulnerable libraries on both their dataset named VeraJava and our VulLib dataset.
arXiv Detail & Related papers (2023-07-17T02:54:07Z) - Analyzing Maintenance Activities of Software Libraries [55.2480439325792]
Industrial applications heavily integrate open-source software libraries nowadays.<n>I want to introduce an automatic monitoring approach for industrial applications to identify open-source dependencies that show negative signs regarding their current or future maintenance activities.
arXiv Detail & Related papers (2023-06-09T16:51:25Z) - On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository.
We retrieve over 53k potential vulnerable clones from Maven Central.
We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - Detecting Security Fixes in Open-Source Repositories using Static Code
Analyzers [8.716427214870459]
We study the extent to which the output of off-the-shelf static code analyzers can be used as a source of features to represent commits in Machine Learning (ML) applications.
We investigate how such features can be used to construct embeddings and train ML models to automatically identify source code commits that contain vulnerability fixes.
We find that the combination of our method with commit2vec represents a tangible improvement over the state of the art in the automatic identification of commits that fix vulnerabilities.
arXiv Detail & Related papers (2021-05-07T15:57:17Z) - Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance.
We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems.
We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.