Assumptions to Evidence: Evaluating Security Practices Adoption and Their Impact on Outcomes in the npm Ecosystem
- URL: http://arxiv.org/abs/2504.14026v2
- Date: Sat, 02 Aug 2025 02:00:18 GMT
- Title: Assumptions to Evidence: Evaluating Security Practices Adoption and Their Impact on Outcomes in the npm Ecosystem
- Authors: Nusrat Zahan, Imranur Rahman, Laurie Williams,
- Abstract summary: The goal of this study is to assist practitioners and policymakers in making informed decisions on which security practices to adopt.<n>We analyzed the adoption of security practices and their impact on security outcome metrics across 145K npm packages.<n>Our findings reveal that aggregated adoption of security practices is associated with 5.2 fewer vulnerabilities, 216.8 days faster MTTR, and 52.3 days faster MTTU.
- Score: 5.250288418639076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Practitioners often struggle with the overwhelming number of security practices outlined in cybersecurity frameworks for risk mitigation. Given the limited budget, time, and resources, practitioners want to prioritize the adoption of security practices based on empirical evidence. The goal of this study is to assist practitioners and policymakers in making informed decisions on which security practices to adopt by evaluating the relationship between software security practices adoption and security outcome metrics. To do this, we analyzed the adoption of security practices and their impact on security outcome metrics across 145K npm packages. We selected the OpenSSF Scorecard metrics to automatically measure the adoption of security practices in npm GitHub repositories. We also investigated project-level security outcome metrics: the number of open vulnerabilities (Vul_Count)), mean time to remediate (MTTR) vulnerabilities in dependencies, and mean time to update (MTTU) dependencies. We conducted regression and causal analysis using 11 Scorecard metrics and the aggregated Scorecard score (computed by aggregating individual security practice scores) as predictors and Vul_Count), MTTR, and MTTU as target variables. Our findings reveal that aggregated adoption of security practices is associated with 5.2 fewer vulnerabilities, 216.8 days faster MTTR, and 52.3 days faster MTTU. Repository characteristics have an impact on security practice effectiveness: repositories with high security practice adoptions, especially those that are mature, actively maintained, large in size, have many contributors, few dependencies, and high download volumes, tend to exhibit better outcomes compared to smaller or inactive repositories.
Related papers
- SafeSci: Safety Evaluation of Large Language Models in Science Domains and Beyond [134.43113804188195]
We introduce SafeSci, a comprehensive framework for safety evaluation and enhancement in scientific contexts.<n>SafeSci comprises SafeSciBench, a multi-disciplinary benchmark with 0.25M samples, and SafeSciTrain, a large-scale dataset containing 1.5M samples for safety enhancement.
arXiv Detail & Related papers (2026-03-02T08:16:04Z) - RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories [58.32028251925354]
Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, but their proficiency in producing secure code remains a critical, under-explored area.<n>We introduce RealSec-bench, a new benchmark for secure code generation meticulously constructed from real-world, high-risk Java repositories.
arXiv Detail & Related papers (2026-01-30T08:29:01Z) - SecureCAI: Injection-Resilient LLM Assistants for Cybersecurity Operations [0.0]
This paper introduces SecureCAI, a novel defense framework extending Constitutional AI principles with security-aware guardrails.<n>SecureCAI reduces attack success rates by 94.7% compared to baseline models.
arXiv Detail & Related papers (2026-01-12T18:59:45Z) - When Secure Isn't: Assessing the Security of Machine Learning Model Sharing [5.493595534345332]
We evaluate the security posture of frameworks and hubs, assess whether security-oriented mechanisms offer real protection, and survey how users perceive the security narratives surrounding model sharing.<n>Our evaluation shows that most frameworks and hubs address security risks partially at best, often by shifting responsibility to the user.<n>We debunk the misconceptions that the model-sharing problem is largely solved and that its security can be guaranteed by the file format used for sharing.
arXiv Detail & Related papers (2025-09-08T13:55:54Z) - Security Debt in Practice: Nuanced Insights from Practitioners [0.3277163122167433]
Tight deadlines, limited resources, and prioritization of functionality over security can lead to insecure coding practices.<n>Despite their critical importance, there is limited empirical evidence on how software practitioners perceive, manage, and communicate Security Debts.<n>This study is based on semi-structured interviews with 22 software practitioners across various roles, organizations, and countries.
arXiv Detail & Related papers (2025-07-15T14:28:28Z) - When Developer Aid Becomes Security Debt: A Systematic Analysis of Insecure Behaviors in LLM Coding Agents [1.0923877073891446]
LLM-based coding agents are rapidly being deployed in software development, yet their security implications remain poorly understood.<n>We conducted the first systematic security evaluation of autonomous coding agents, analyzing over 12,000 actions across five state-of-the-art models.
arXiv Detail & Related papers (2025-07-12T16:11:07Z) - OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety [58.201189860217724]
We introduce OpenAgentSafety, a comprehensive framework for evaluating agent behavior across eight critical risk categories.<n>Unlike prior work, our framework evaluates agents that interact with real tools, including web browsers, code execution environments, file systems, bash shells, and messaging platforms.<n>It combines rule-based analysis with LLM-as-judge assessments to detect both overt and subtle unsafe behaviors.
arXiv Detail & Related papers (2025-07-08T16:18:54Z) - Advancing Embodied Agent Security: From Safety Benchmarks to Input Moderation [52.83870601473094]
Embodied agents exhibit immense potential across a multitude of domains.<n>Existing research predominantly concentrates on the security of general large language models.<n>This paper introduces a novel input moderation framework, meticulously designed to safeguard embodied agents.
arXiv Detail & Related papers (2025-04-22T08:34:35Z) - SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models [50.34706204154244]
Acquiring reasoning capabilities catastrophically degrades inherited safety alignment.<n>Certain scenarios suffer 25 times higher attack rates.<n>Despite tight reasoning-answer safety coupling, MLRMs demonstrate nascent self-correction.
arXiv Detail & Related papers (2025-04-09T06:53:23Z) - How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities [62.474732677086855]
Large language model (LLM) routing has emerged as a crucial strategy for balancing computational costs with performance.<n>We propose the DSC benchmark: Diverse, Simple, and Categorized, an evaluation framework that categorizes router performance across a broad spectrum of query types.
arXiv Detail & Related papers (2025-03-20T19:52:30Z) - Safe Vision-Language Models via Unsafe Weights Manipulation [75.04426753720551]
We revise safety evaluation by introducing Safe-Ground, a new set of metrics that evaluate safety at different levels of granularity.<n>We take a different direction and explore whether it is possible to make a model safer without training, introducing Unsafe Weights Manipulation (UWM)<n>UWM uses a calibration set of safe and unsafe instances to compare activations between safe and unsafe content, identifying the most important parameters for processing the latter.
arXiv Detail & Related papers (2025-03-14T17:00:22Z) - CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection [2.5228276786940182]
This paper introduces CASTLE, a benchmarking framework for evaluating the vulnerability detection capabilities of different methods.<n>We assess 13 static analysis tools, 10 LLMs, and 2 formal verification tools using a hand-crafted dataset of 250 micro-benchmark programs covering 25 common CWEs.
arXiv Detail & Related papers (2025-03-12T14:30:05Z) - Beyond the Surface: An NLP-based Methodology to Automatically Estimate CVE Relevance for CAPEC Attack Patterns [42.63501759921809]
We propose a methodology leveraging Natural Language Processing (NLP) to associate Common Vulnerabilities and Exposure (CAPEC) vulnerabilities with Common Attack Patternion and Classification (CAPEC) attack patterns.<n> Experimental evaluations demonstrate superior performance compared to state-of-the-art models.
arXiv Detail & Related papers (2025-01-13T08:39:52Z) - Agent-SafetyBench: Evaluating the Safety of LLM Agents [72.92604341646691]
We introduce Agent-SafetyBench, a benchmark designed to evaluate the safety of large language models (LLMs)<n>Agent-SafetyBench encompasses 349 interaction environments and 2,000 test cases, evaluating 8 categories of safety risks and covering 10 common failure modes frequently encountered in unsafe interactions.<n>Our evaluation of 16 popular LLM agents reveals a concerning result: none of the agents achieves a safety score above 60%.
arXiv Detail & Related papers (2024-12-19T02:35:15Z) - The Impact of SBOM Generators on Vulnerability Assessment in Python: A Comparison and a Novel Approach [56.4040698609393]
Software Bill of Materials (SBOM) has been promoted as a tool to increase transparency and verifiability in software composition.
Current SBOM generation tools often suffer from inaccuracies in identifying components and dependencies.
We propose PIP-sbom, a novel pip-inspired solution that addresses their shortcomings.
arXiv Detail & Related papers (2024-09-10T10:12:37Z) - Trust, but Verify: Evaluating Developer Behavior in Mitigating Security Vulnerabilities in Open-Source Software Projects [0.11999555634662631]
This study investigates vulnerabilities in dependencies of sampled open-source software (OSS) projects.
We have identified common issues in outdated or unmaintained dependencies, that pose significant security risks.
Results suggest that reducing the number of direct dependencies and prioritizing well-established libraries with strong security records are effective strategies for enhancing the software security landscape.
arXiv Detail & Related papers (2024-08-26T13:46:48Z) - The Art of Defending: A Systematic Evaluation and Analysis of LLM
Defense Strategies on Safety and Over-Defensiveness [56.174255970895466]
Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications.
This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark.
arXiv Detail & Related papers (2023-12-30T17:37:06Z) - Safety Margins for Reinforcement Learning [53.10194953873209]
We show how to leverage proxy criticality metrics to generate safety margins.
We evaluate our approach on learned policies from APE-X and A3C within an Atari environment.
arXiv Detail & Related papers (2023-07-25T16:49:54Z) - Online Safety Property Collection and Refinement for Safe Deep
Reinforcement Learning in Mapless Navigation [79.89605349842569]
We introduce the Collection and Refinement of Online Properties (CROP) framework to design properties at training time.
CROP employs a cost signal to identify unsafe interactions and use them to shape safety properties.
We evaluate our approach in several robotic mapless navigation tasks and demonstrate that the violation metric computed with CROP allows higher returns and lower violations over previous Safe DRL approaches.
arXiv Detail & Related papers (2023-02-13T21:19:36Z) - Semantic Similarity-Based Clustering of Findings From Security Testing
Tools [1.6058099298620423]
In particular, it is common practice to use automated security testing tools that generate reports after inspecting a software artifact from multiple perspectives.
To identify these duplicate findings manually, a security expert has to invest resources like time, effort, and knowledge.
In this study, we investigated the potential of applying Natural Language Processing for clustering semantically similar security findings.
arXiv Detail & Related papers (2022-11-20T19:03:19Z) - Do Software Security Practices Yield Fewer Vulnerabilities? [6.6840472845873276]
The goal of this study is to assist practitioners and researchers making informed decisions on which security practices to adopt.
Four security practices were the most important practices influencing vulnerability count.
The number of reported vulnerabilities increased rather than reduced as the aggregate security score of the packages increased.
arXiv Detail & Related papers (2022-10-20T20:04:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.