DONAPI: Malicious NPM Packages Detector using Behavior Sequence Knowledge Mapping
- URL: http://arxiv.org/abs/2403.08334v1
- Date: Wed, 13 Mar 2024 08:38:21 GMT
- Title: DONAPI: Malicious NPM Packages Detector using Behavior Sequence Knowledge Mapping
- Authors: Cheng Huang, Nannan Wang, Ziyan Wang, Siqi Sun, Lingzi Li, Junren Chen, Qianchong Zhao, Jiaxuan Han, Zhen Yang, Lei Shi,
- Abstract summary: npm is the most extensive package manager, hosting more than 2 million third-party open-source packages.
In this paper, we synchronize a local package cache containing more than 3.4 million packages in near real-time to give us access to more package code details.
We propose the DONAPI, an automatic malicious npm packages detector that combines static and dynamic analysis.
- Score: 28.852274185512236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the growing popularity of modularity in software development comes the rise of package managers and language ecosystems. Among them, npm stands out as the most extensive package manager, hosting more than 2 million third-party open-source packages that greatly simplify the process of building code. However, this openness also brings security risks, as evidenced by numerous package poisoning incidents. In this paper, we synchronize a local package cache containing more than 3.4 million packages in near real-time to give us access to more package code details. Further, we perform manual inspection and API call sequence analysis on packages collected from public datasets and security reports to build a hierarchical classification framework and behavioral knowledge base covering different sensitive behaviors. In addition, we propose the DONAPI, an automatic malicious npm packages detector that combines static and dynamic analysis. It makes preliminary judgments on the degree of maliciousness of packages by code reconstruction techniques and static analysis, extracts dynamic API call sequences to confirm and identify obfuscated content that static analysis can not handle alone, and finally tags malicious software packages based on the constructed behavior knowledge base. To date, we have identified and manually confirmed 325 malicious samples and discovered 2 unusual API calls and 246 API call sequences that have not appeared in known samples.
Related papers
- Less Is More: A Mixed-Methods Study on Security-Sensitive API Calls in Java for Better Dependency Selection [3.6525326603691504]
This study aims to aid developers in selecting their dependency by understanding security sensitive APIs in their dependency through call graph analysis.
The number of Security sensitive API calls of functionally similar packages can vary from 0 to 368 in one API category and 0 to 429 in total.
Our survey results show that 73% developers agree that information about the number and type of security-sensitive API calls of functionally similar packages would have been useful in their dependency selection.
arXiv Detail & Related papers (2024-08-05T22:01:18Z) - How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE)
We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories.
To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - A Large-scale Fine-grained Analysis of Packages in Open-Source Software Ecosystems [13.610690659041417]
Malicious packages have less metadata content and utilize fewer static and dynamic functions than legitimate ones.
One dimension in fine-grained information (FGI) has sufficient distinguishable capability to detect malicious packages.
arXiv Detail & Related papers (2024-04-17T15:16:01Z) - Malicious Package Detection using Metadata Information [0.272760415353533]
We introduce a metadata-based malicious package detection model, MeMPtec.
MeMPtec extracts a set of features from package metadata information.
Our experiments indicate a significant reduction in both false positives and false negatives.
arXiv Detail & Related papers (2024-02-12T06:54:57Z) - On the Feasibility of Cross-Language Detection of Malicious Packages in
npm and PyPI [6.935278888313423]
Malicious users started to spread malware by publishing open-source packages containing malicious code.
Recent works apply machine learning techniques to detect malicious packages in the npm ecosystem.
We present a novel approach that involves a set of language-independent features and the training of models capable of detecting malicious packages in npm and PyPI.
arXiv Detail & Related papers (2023-10-14T12:32:51Z) - Malicious Package Detection in NPM and PyPI using a Single Model of
Malicious Behavior Sequence [7.991922551051611]
Package registries NPM and PyPI have been flooded with malicious packages.
The effectiveness of existing malicious NPM and PyPI package detection approaches is hindered by two challenges.
We propose and implement Cerebro to detect malicious packages in NPM and PyPI.
arXiv Detail & Related papers (2023-09-06T00:58:59Z) - DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection [55.70982767084996]
A critical yet frequently overlooked challenge in the field of deepfake detection is the lack of a standardized, unified, comprehensive benchmark.
We present the first comprehensive benchmark for deepfake detection, called DeepfakeBench, which offers three key contributions.
DeepfakeBench contains 15 state-of-the-art detection methods, 9CL datasets, a series of deepfake detection evaluation protocols and analysis tools, as well as comprehensive evaluations.
arXiv Detail & Related papers (2023-07-04T01:34:41Z) - GLENet: Boosting 3D Object Detectors with Generative Label Uncertainty Estimation [70.75100533512021]
In this paper, we formulate the label uncertainty problem as the diversity of potentially plausible bounding boxes of objects.
We propose GLENet, a generative framework adapted from conditional variational autoencoders, to model the one-to-many relationship between a typical 3D object and its potential ground-truth bounding boxes with latent variables.
The label uncertainty generated by GLENet is a plug-and-play module and can be conveniently integrated into existing deep 3D detectors.
arXiv Detail & Related papers (2022-07-06T06:26:17Z) - S3M: Siamese Stack (Trace) Similarity Measure [55.58269472099399]
We present S3M -- the first approach to computing stack trace similarity based on deep learning.
It is based on a biLSTM encoder and a fully-connected classifier to compute similarity.
Our experiments demonstrate the superiority of our approach over the state-of-the-art on both open-sourced data and a private JetBrains dataset.
arXiv Detail & Related papers (2021-03-18T21:10:41Z) - D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using
Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools.
We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.