Related papers: Malicious Package Detection in NPM and PyPI using a Single Model of Malicious Behavior Sequence

Malicious Package Detection in NPM and PyPI using a Single Model of Malicious Behavior Sequence

URL: http://arxiv.org/abs/2309.02637v1
Date: Wed, 6 Sep 2023 00:58:59 GMT
Title: Malicious Package Detection in NPM and PyPI using a Single Model of Malicious Behavior Sequence
Authors: Junan Zhang, Kaifeng Huang, Bihuan Chen, Chong Wang, Zhenhao Tian, Xin Peng
Abstract summary: Package registries NPM and PyPI have been flooded with malicious packages. The effectiveness of existing malicious NPM and PyPI package detection approaches is hindered by two challenges. We propose and implement Cerebro to detect malicious packages in NPM and PyPI.
Score: 7.991922551051611
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Open-source software (OSS) supply chain enlarges the attack surface, which makes package registries attractive targets for attacks. Recently, package registries NPM and PyPI have been flooded with malicious packages. The effectiveness of existing malicious NPM and PyPI package detection approaches is hindered by two challenges. The first challenge is how to leverage the knowledge of malicious packages from different ecosystems in a unified way such that multi-lingual malicious package detection can be feasible. The second challenge is how to model malicious behavior in a sequential way such that maliciousness can be precisely captured. To address the two challenges, we propose and implement Cerebro to detect malicious packages in NPM and PyPI. We curate a feature set based on a high-level abstraction of malicious behavior to enable multi-lingual knowledge fusing. We organize extracted features into a behavior sequence to model sequential malicious behavior. We fine-tune the BERT model to understand the semantics of malicious behavior. Extensive evaluation has demonstrated the effectiveness of Cerebro over the state-of-the-art as well as the practically acceptable efficiency. Cerebro has successfully detected 306 and 196 new malicious packages in PyPI and NPM, and received 385 thank letters from the official PyPI and NPM teams.

Related papers

Wolf Hidden in Sheep's Conversations: Toward Harmless Data-Based Backdoor Attacks for Jailbreaking Large Language Models [69.11679786018206]
Supervised fine-tuning (SFT) aligns large language models with human intent by training them on labeled task-specific data.<n>Recent studies have shown that malicious attackers can inject backdoors into these models by embedding triggers into the harmful question-answer pairs.<n>We propose a novel clean-data backdoor attack for jailbreaking LLMs.
arXiv Detail & Related papers (2025-05-23T08:13:59Z)
Backdoor Cleaning without External Guidance in MLLM Fine-tuning [76.82121084745785]
Believe Your Eyes (BYE) is a data filtering framework that leverages attention entropy patterns as self-supervised signals to identify and filter backdoor samples.<n>It achieves near-zero attack success rates while maintaining clean-task performance.
arXiv Detail & Related papers (2025-05-22T17:11:58Z)
MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison [60.30753230776882]
LLM agents are vulnerable to indirect prompt injection (IPI) attacks. We present MELON, a novel IPI defense. We show that MELON outperforms SOTA defenses in both attack prevention and utility preservation.
arXiv Detail & Related papers (2025-02-07T18:57:49Z)
A Machine Learning-Based Approach For Detecting Malicious PyPI Packages [4.311626046942916]
In modern software development, the use of external libraries and packages is increasingly prevalent. This reliance on reusing code introduces serious risks for deployed software in the form of malicious packages. We propose a data-driven approach that uses machine learning and static analysis to examine the package's metadata, code, files, and textual characteristics.
arXiv Detail & Related papers (2024-12-06T18:49:06Z)
Towards Robust Detection of Open Source Software Supply Chain Poisoning Attacks in Industry Environments [9.29518367616395]
We present OSCAR, a dynamic code poisoning detection pipeline for NPM and PyPI ecosystems. OSCAR fully executes packages in a sandbox environment, employs fuzz testing on exported functions and classes, and implements aspect-based behavior monitoring. We evaluate OSCAR against six existing tools using a comprehensive benchmark dataset of real-world malicious and benign packages.
arXiv Detail & Related papers (2024-09-14T08:01:43Z)
AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning [93.77763753231338]
Adversarial Contrastive Prompt Tuning (ACPT) is proposed to fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries. We show that ACPT can detect 7 state-of-the-art query-based attacks with $>99%$ detection rate within 5 shots. We also show that ACPT is robust to 3 types of adaptive attacks.
arXiv Detail & Related papers (2024-08-04T09:53:50Z)
DONAPI: Malicious NPM Packages Detector using Behavior Sequence Knowledge Mapping [28.852274185512236]
npm is the most extensive package manager, hosting more than 2 million third-party open-source packages. In this paper, we synchronize a local package cache containing more than 3.4 million packages in near real-time to give us access to more package code details. We propose the DONAPI, an automatic malicious npm packages detector that combines static and dynamic analysis.
arXiv Detail & Related papers (2024-03-13T08:38:21Z)
Malicious Package Detection using Metadata Information [0.272760415353533]
We introduce a metadata-based malicious package detection model, MeMPtec. MeMPtec extracts a set of features from package metadata information. Our experiments indicate a significant reduction in both false positives and false negatives.
arXiv Detail & Related papers (2024-02-12T06:54:57Z)
Model Supply Chain Poisoning: Backdooring Pre-trained Models via Embedding Indistinguishability [61.549465258257115]
We propose a novel and severer backdoor attack, TransTroj, which enables the backdoors embedded in PTMs to efficiently transfer in the model supply chain. Experimental results show that our method significantly outperforms SOTA task-agnostic backdoor attacks.
arXiv Detail & Related papers (2024-01-29T04:35:48Z)
Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks. This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs. We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z)
On the Feasibility of Cross-Language Detection of Malicious Packages in npm and PyPI [6.935278888313423]
Malicious users started to spread malware by publishing open-source packages containing malicious code. Recent works apply machine learning techniques to detect malicious packages in the npm ecosystem. We present a novel approach that involves a set of language-independent features and the training of models capable of detecting malicious packages in npm and PyPI.
arXiv Detail & Related papers (2023-10-14T12:32:51Z)
An Empirical Study of Malicious Code In PyPI Ecosystem [15.739368369031277]
PyPI provides a convenient and accessible package management platform to developers. The rapid development of the PyPI ecosystem has led to a severe problem of malicious package propagation. We conduct an empirical study to understand the characteristics and current state of the malicious code lifecycle in the PyPI ecosystem.
arXiv Detail & Related papers (2023-09-20T02:51:02Z)
Rule-based Shielding for Partially Observable Monte-Carlo Planning [78.05638156687343]
We propose two contributions to Partially Observable Monte-Carlo Planning (POMCP) The first is a method for identifying unexpected actions selected by POMCP with respect to expert prior knowledge of the task. The second is a shielding approach that prevents POMCP from selecting unexpected actions. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to velocity regulation in mobile robot navigation.
arXiv Detail & Related papers (2021-04-28T14:23:38Z)
Exploiting Submodular Value Functions For Scaling Up Active Perception [60.81276437097671]
In active perception tasks, agent aims to select sensory actions that reduce uncertainty about one or more hidden variables. Partially observable Markov decision processes (POMDPs) provide a natural model for such problems. As the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially.
arXiv Detail & Related papers (2020-09-21T09:11:36Z)
Trojaning Language Models for Fun and Profit [53.45727748224679]
TROJAN-LM is a new class of trojaning attacks in which maliciously crafted LMs trigger host NLP systems to malfunction. By empirically studying three state-of-the-art LMs in a range of security-critical NLP tasks, we demonstrate that TROJAN-LM possesses the following properties.
arXiv Detail & Related papers (2020-08-01T18:22:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.