"I wasn't sure if this is indeed a security risk": Data-driven Understanding of Security Issue Reporting in GitHub Repositories of Open Source npm Packages
- URL: http://arxiv.org/abs/2506.07728v1
- Date: Mon, 09 Jun 2025 13:11:35 GMT
- Title: "I wasn't sure if this is indeed a security risk": Data-driven Understanding of Security Issue Reporting in GitHub Repositories of Open Source npm Packages
- Authors: Rajdeep Ghosh, Shiladitya De, Mainack Mondal,
- Abstract summary: This work collected 10,907,467 issues reported across GitHub repositories of 45,466 diverse npm packages.<n>We found that the tags associated with these issues indicate the existence of only 0.13% security-related issues.<n>Our approach of manual analysis followed by developing high accuracy machine learning models identify 1,617,738 security-related issues which are not tagged as security-related.
- Score: 8.360992461585308
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The npm (Node Package Manager) ecosystem is the most important package manager for JavaScript development with millions of users. Consequently, a plethora of earlier work investigated how vulnerability reporting, patch propagation, and in general detection as well as resolution of security issues in such ecosystems can be facilitated. However, understanding the ground reality of security-related issue reporting by users (and bots) in npm-along with the associated challenges has been relatively less explored at scale. In this work, we bridge this gap by collecting 10,907,467 issues reported across GitHub repositories of 45,466 diverse npm packages. We found that the tags associated with these issues indicate the existence of only 0.13% security-related issues. However, our approach of manual analysis followed by developing high accuracy machine learning models identify 1,617,738 security-related issues which are not tagged as security-related (14.8% of all issues) as well as 4,461,934 comments made on these issues. We found that the bots which are in wide use today might not be sufficient for either detecting or offering assistance. Furthermore, our analysis of user-developer interaction data hints that many user-reported security issues might not be addressed by developers-they are not tagged as security-related issues and might be closed without valid justification. Consequently, a correlation analysis hints that the developers quickly handle security issues with known solutions (e.g., corresponding to CVE). However, security issues without such known solutions (even with reproducible code) might not be resolved. Our findings offer actionable insights for improving security management in open-source ecosystems, highlighting the need for smarter tools and better collaboration. The data and code for this work is available at https://doi.org/10.5281/zenodo.15614029
Related papers
- CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale [46.76144797837242]
Large language model (LLM) agents are becoming increasingly skilled at handling cybersecurity tasks autonomously.<n>Existing benchmarks fall short, often failing to capture real-world scenarios or being limited in scope.<n>We introduce CyberGym, a large-scale and high-quality cybersecurity evaluation framework featuring 1,507 real-world vulnerabilities.
arXiv Detail & Related papers (2025-06-03T07:35:14Z) - VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents [74.6761188527948]
Computer-Use Agents (CUAs) with full system access pose significant security and privacy risks.<n>We investigate Visual Prompt Injection (VPI) attacks, where malicious instructions are visually embedded within rendered user interfaces.<n>Our empirical study shows that current CUAs and BUAs can be deceived at rates of up to 51% and 100%, respectively, on certain platforms.
arXiv Detail & Related papers (2025-06-03T05:21:50Z) - Eradicating the Unseen: Detecting, Exploiting, and Remediating a Path Traversal Vulnerability across GitHub [1.2124551005857036]
Vulnerabilities in open-source software can cause cascading effects in the modern digital ecosystem.<n>We identified 1,756 vulnerable open-source projects, some of which are very influential.<n>We have responsibly disclosed the vulnerability to the maintainers, and 14% of the reported vulnerabilities have been remediated.
arXiv Detail & Related papers (2025-05-26T16:29:21Z) - Towards Trustworthy GUI Agents: A Survey [64.6445117343499]
This survey examines the trustworthiness of GUI agents in five critical dimensions.<n>We identify major challenges such as vulnerability to adversarial attacks, cascading failure modes in sequential decision-making.<n>As GUI agents become more widespread, establishing robust safety standards and responsible development practices is essential.
arXiv Detail & Related papers (2025-03-30T13:26:00Z) - Investigating Vulnerability Disclosures in Open-Source Software Using Bug Bounty Reports and Security Advisories [6.814841205623832]
We conduct an empirical study on 3,798 reviewed GitHub security advisories and 4,033 disclosed OSS bug bounty reports.<n>We are the first to determine the explicit process describing how OSS vulnerabilities propagate from security advisories and bug bounty reports.
arXiv Detail & Related papers (2025-01-29T16:36:41Z) - Discovery of Timeline and Crowd Reaction of Software Vulnerability Disclosures [47.435076500269545]
Apache Log4J was found to be vulnerable to remote code execution attacks.
More than 35,000 packages were forced to update their Log4J libraries with the latest version.
It is practically reasonable for software developers to update their third-party libraries whenever the software vendors have released a vulnerable-free version.
arXiv Detail & Related papers (2024-11-12T01:55:51Z) - Just another copy and paste? Comparing the security vulnerabilities of ChatGPT generated code and StackOverflow answers [4.320393382724067]
This study empirically compares the vulnerabilities of ChatGPT and StackOverflow snippets.
ChatGPT contained 248 vulnerabilities compared to the 302 vulnerabilities found in SO snippets, producing 20% fewer vulnerabilities with a statistically significant difference.
Our findings suggest developers are under-educated on insecure code propagation from both platforms.
arXiv Detail & Related papers (2024-03-22T20:06:41Z) - What Can Self-Admitted Technical Debt Tell Us About Security? A
Mixed-Methods Study [6.286506087629511]
Self-Admitted Technical Debt (SATD)
can be deemed as dreadful sources of information on potentially exploitable vulnerabilities and security flaws.
This work investigates the security implications of SATD from a technical and developer-centred perspective.
arXiv Detail & Related papers (2024-01-23T13:48:49Z) - Communicating on Security within Software Development Issue Tracking [0.0]
We analyse interfaces from prominent issue trackers to see how they support security communication and how they integrate security scoring.
Users in our study were not comfortable with CVSS analysis, though were able to reason in a manner compatible with CVSS.
This suggests that adding improvements to communication through CVSS-like questioning in issue tracking software can elicit better security interactions.
arXiv Detail & Related papers (2023-08-25T16:38:27Z) - On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository.
We retrieve over 53k potential vulnerable clones from Maven Central.
We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z) - Detecting Security Patches via Behavioral Data in Code Repositories [11.052678122289871]
We show a system to automatically identify security patches using only the developer behavior in the Git repository.
We showed we can reveal concealed security patches with an accuracy of 88.3% and F1 Score of 89.8%.
arXiv Detail & Related papers (2023-02-04T06:43:07Z) - Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance.
We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems.
We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.