Malicious Package Detection in NPM and PyPI using a Single Model of
Malicious Behavior Sequence
- URL: http://arxiv.org/abs/2309.02637v1
- Date: Wed, 6 Sep 2023 00:58:59 GMT
- Title: Malicious Package Detection in NPM and PyPI using a Single Model of
Malicious Behavior Sequence
- Authors: Junan Zhang, Kaifeng Huang, Bihuan Chen, Chong Wang, Zhenhao Tian, Xin
Peng
- Abstract summary: Package registries NPM and PyPI have been flooded with malicious packages.
The effectiveness of existing malicious NPM and PyPI package detection approaches is hindered by two challenges.
We propose and implement Cerebro to detect malicious packages in NPM and PyPI.
- Score: 7.991922551051611
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open-source software (OSS) supply chain enlarges the attack surface, which
makes package registries attractive targets for attacks. Recently, package
registries NPM and PyPI have been flooded with malicious packages. The
effectiveness of existing malicious NPM and PyPI package detection approaches
is hindered by two challenges. The first challenge is how to leverage the
knowledge of malicious packages from different ecosystems in a unified way such
that multi-lingual malicious package detection can be feasible. The second
challenge is how to model malicious behavior in a sequential way such that
maliciousness can be precisely captured. To address the two challenges, we
propose and implement Cerebro to detect malicious packages in NPM and PyPI. We
curate a feature set based on a high-level abstraction of malicious behavior to
enable multi-lingual knowledge fusing. We organize extracted features into a
behavior sequence to model sequential malicious behavior. We fine-tune the BERT
model to understand the semantics of malicious behavior. Extensive evaluation
has demonstrated the effectiveness of Cerebro over the state-of-the-art as well
as the practically acceptable efficiency. Cerebro has successfully detected 306
and 196 new malicious packages in PyPI and NPM, and received 385 thank letters
from the official PyPI and NPM teams.
Related papers
- Towards Robust Detection of Open Source Software Supply Chain Poisoning Attacks in Industry Environments [9.29518367616395]
We present OSCAR, a dynamic code poisoning detection pipeline for NPM and PyPI ecosystems.
OSCAR fully executes packages in a sandbox environment, employs fuzz testing on exported functions and classes, and implements aspect-based behavior monitoring.
We evaluate OSCAR against six existing tools using a comprehensive benchmark dataset of real-world malicious and benign packages.
arXiv Detail & Related papers (2024-09-14T08:01:43Z) - AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning [93.77763753231338]
Adversarial Contrastive Prompt Tuning (ACPT) is proposed to fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries.
We show that ACPT can detect 7 state-of-the-art query-based attacks with $>99%$ detection rate within 5 shots.
We also show that ACPT is robust to 3 types of adaptive attacks.
arXiv Detail & Related papers (2024-08-04T09:53:50Z) - DONAPI: Malicious NPM Packages Detector using Behavior Sequence Knowledge Mapping [28.852274185512236]
npm is the most extensive package manager, hosting more than 2 million third-party open-source packages.
In this paper, we synchronize a local package cache containing more than 3.4 million packages in near real-time to give us access to more package code details.
We propose the DONAPI, an automatic malicious npm packages detector that combines static and dynamic analysis.
arXiv Detail & Related papers (2024-03-13T08:38:21Z) - Malicious Package Detection using Metadata Information [0.272760415353533]
We introduce a metadata-based malicious package detection model, MeMPtec.
MeMPtec extracts a set of features from package metadata information.
Our experiments indicate a significant reduction in both false positives and false negatives.
arXiv Detail & Related papers (2024-02-12T06:54:57Z) - Model Supply Chain Poisoning: Backdooring Pre-trained Models via Embedding Indistinguishability [61.549465258257115]
We propose a novel and severer backdoor attack, TransTroj, which enables the backdoors embedded in PTMs to efficiently transfer in the model supply chain.
Experimental results show that our method significantly outperforms SOTA task-agnostic backdoor attacks.
arXiv Detail & Related papers (2024-01-29T04:35:48Z) - Token-Level Adversarial Prompt Detection Based on Perplexity Measures
and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks.
This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs.
We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z) - On the Feasibility of Cross-Language Detection of Malicious Packages in
npm and PyPI [6.935278888313423]
Malicious users started to spread malware by publishing open-source packages containing malicious code.
Recent works apply machine learning techniques to detect malicious packages in the npm ecosystem.
We present a novel approach that involves a set of language-independent features and the training of models capable of detecting malicious packages in npm and PyPI.
arXiv Detail & Related papers (2023-10-14T12:32:51Z) - An Empirical Study of Malicious Code In PyPI Ecosystem [15.739368369031277]
PyPI provides a convenient and accessible package management platform to developers.
The rapid development of the PyPI ecosystem has led to a severe problem of malicious package propagation.
We conduct an empirical study to understand the characteristics and current state of the malicious code lifecycle in the PyPI ecosystem.
arXiv Detail & Related papers (2023-09-20T02:51:02Z) - Rule-based Shielding for Partially Observable Monte-Carlo Planning [78.05638156687343]
We propose two contributions to Partially Observable Monte-Carlo Planning (POMCP)
The first is a method for identifying unexpected actions selected by POMCP with respect to expert prior knowledge of the task.
The second is a shielding approach that prevents POMCP from selecting unexpected actions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to velocity regulation in mobile robot navigation.
arXiv Detail & Related papers (2021-04-28T14:23:38Z) - Exploiting Submodular Value Functions For Scaling Up Active Perception [60.81276437097671]
In active perception tasks, agent aims to select sensory actions that reduce uncertainty about one or more hidden variables.
Partially observable Markov decision processes (POMDPs) provide a natural model for such problems.
As the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially.
arXiv Detail & Related papers (2020-09-21T09:11:36Z) - Trojaning Language Models for Fun and Profit [53.45727748224679]
TROJAN-LM is a new class of trojaning attacks in which maliciously crafted LMs trigger host NLP systems to malfunction.
By empirically studying three state-of-the-art LMs in a range of security-critical NLP tasks, we demonstrate that TROJAN-LM possesses the following properties.
arXiv Detail & Related papers (2020-08-01T18:22:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.