Related papers: Detecting and Characterizing Low and No Functionality Packages in the NPM Ecosystem

Detecting and Characterizing Low and No Functionality Packages in the NPM Ecosystem

URL: http://arxiv.org/abs/2510.04495v1
Date: Mon, 06 Oct 2025 05:11:49 GMT
Title: Detecting and Characterizing Low and No Functionality Packages in the NPM Ecosystem
Authors: Napasorn Tevarut, Brittany Reid, Yutaro Kashiwa, Pattara Leelaprute, Arnon Rungsawang, Bundit Manaskasemsak, Hajimu Iida,
Abstract summary: Trivial packages, small modules with low functionality, are common in the npm ecosystem.<n>This paper refines existing definitions and introduces data-only packages that contain no executable logic.<n>A rule-based static analysis method is developed to detect trivial and data-only packages.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Trivial packages, small modules with low functionality, are common in the npm ecosystem and can pose security risks despite their simplicity. This paper refines existing definitions and introduce data-only packages that contain no executable logic. A rule-based static analysis method is developed to detect trivial and data-only packages and evaluate their prevalence and associated risks in the 2025 npm ecosystem. The analysis shows that 17.92% of packages are trivial, with vulnerability levels comparable to non-trivial ones, and data-only packages, though rare, also contain risks. The proposed detection tool achieves 94% accuracy (macro-F1 0.87), enabling effective large-scale analysis to reduce security exposure. This findings suggest that trivial and data-only packages warrant greater attention in dependency management to reduce potential technical debt and security exposure.

Related papers

Beyond Raw Detection Scores: Markov-Informed Calibration for Boosting Machine-Generated Text Detection [105.14032334647932]
Machine-generated texts (MGTs) pose risks such as disinformation and phishing, highlighting the need for reliable detection.<n> Metric-based methods, which extract statistically distinguishable features of MGTs, are often more practical than complex model-based methods that are prone to overfitting.<n>We propose a Markov-informed score calibration strategy that models two relationships of context detection scores that may aid calibration.
arXiv Detail & Related papers (2026-02-08T16:06:12Z)
Rethinking Evaluation of Infrared Small Target Detection [105.59753496831739]
This paper introduces a hybrid-level metric incorporating pixel- and target-level performance, proposing a systematic error analysis method, and emphasizing the importance of cross-dataset evaluation.<n>An open-source toolkit has be released to facilitate standardized benchmarking.
arXiv Detail & Related papers (2025-09-21T02:45:07Z)
MalGuard: Towards Real-Time, Accurate, and Actionable Detection of Malicious Packages in PyPI Ecosystem [11.834078597426409]
Malicious package detection has become a critical task in ensuring the security and stability of the PyPI.<n>Existing detection approaches have focused on advancing model selection, evolving from traditional machine learning (ML) models to large language models (LLMs)<n>We propose a novel approach MalGuard based on graph centrality analysis and the LIME (Local Interpretable Model-agnostic Explanations) algorithm to detect malicious packages.
arXiv Detail & Related papers (2025-06-17T12:30:56Z)
Mono: Is Your "Clean" Vulnerability Dataset Really Solvable? Exposing and Trapping Undecidable Patches and Beyond [10.072175823846973]
Existing security patches often suffer from inaccurate labels, insufficient contextual information, and undecidable patches.<n>We present mono, a novel framework that simulates human experts' reasoning process to construct reliable vulnerability datasets.<n> mono can correct 31.0% of labeling errors, recover 89% of inter-procedural vulnerabilities, and reveals that 16.7% of CVEs contain undecidable patches.
arXiv Detail & Related papers (2025-06-04T07:43:04Z)
A Machine Learning-Based Approach For Detecting Malicious PyPI Packages [4.311626046942916]
In modern software development, the use of external libraries and packages is increasingly prevalent.<n>This reliance on reusing code introduces serious risks for deployed software in the form of malicious packages.<n>We propose a data-driven approach that uses machine learning and static analysis to examine the package's metadata, code, files, and textual characteristics.
arXiv Detail & Related papers (2024-12-06T18:49:06Z)
Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning. One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems. We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z)
A Large-scale Fine-grained Analysis of Packages in Open-Source Software Ecosystems [13.610690659041417]
Malicious packages have less metadata content and utilize fewer static and dynamic functions than legitimate ones. One dimension in fine-grained information (FGI) has sufficient distinguishable capability to detect malicious packages.
arXiv Detail & Related papers (2024-04-17T15:16:01Z)
Malicious Package Detection using Metadata Information [0.272760415353533]
We introduce a metadata-based malicious package detection model, MeMPtec. MeMPtec extracts a set of features from package metadata information. Our experiments indicate a significant reduction in both false positives and false negatives.
arXiv Detail & Related papers (2024-02-12T06:54:57Z)
Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative. We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z)
Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive Privacy Analysis and Beyond [57.10914865054868]
We consider vertical logistic regression (VLR) trained with mini-batch descent gradient. We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks.
arXiv Detail & Related papers (2022-07-19T05:47:30Z)
Differential privacy and robust statistics in high dimensions [49.50869296871643]
High-dimensional Propose-Test-Release (HPTR) builds upon three crucial components: the exponential mechanism, robust statistics, and the Propose-Test-Release mechanism. We show that HPTR nearly achieves the optimal sample complexity under several scenarios studied in the literature.
arXiv Detail & Related papers (2021-11-12T06:36:40Z)
Conservative Policy Construction Using Variational Autoencoders for Logged Data with Missing Values [77.99648230758491]
We consider the problem of constructing personalized policies using logged data when there are missing values in the attributes of features. The goal is to recommend an action when $Xt$, a degraded version of $Xb$ with missing values, is observed. In particular, we introduce the textitconservative strategy where the policy is designed to safely handle the uncertainty due to missingness.
arXiv Detail & Related papers (2021-09-08T16:09:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.