One Detector Fits All: Robust and Adaptive Detection of Malicious Packages from PyPI to Enterprises
- URL: http://arxiv.org/abs/2512.04338v1
- Date: Wed, 03 Dec 2025 23:53:56 GMT
- Title: One Detector Fits All: Robust and Adaptive Detection of Malicious Packages from PyPI to Enterprises
- Authors: Biagio Montaruli, Luca Compagna, Serena Elisa Ponta, Davide Balzarotti,
- Abstract summary: We introduce a robust detector capable of seamless integration into both public repositories like PyPI and enterprise ecosystems.<n>To ensure robustness, we propose a novel methodology for generating adversarial packages using fine-grained code obfuscation.<n>Our detector can be seamlessly integrated into both public repositories like PyPI and enterprise ecosystems, ensuring a very low time budget of a few minutes to review the false positives.
- Score: 10.03632278118504
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The rise of supply chain attacks via malicious Python packages demands robust detection solutions. Current approaches, however, overlook two critical challenges: robustness against adversarial source code transformations and adaptability to the varying false positive rate (FPR) requirements of different actors, from repository maintainers (requiring low FPR) to enterprise security teams (higher FPR tolerance). We introduce a robust detector capable of seamless integration into both public repositories like PyPI and enterprise ecosystems. To ensure robustness, we propose a novel methodology for generating adversarial packages using fine-grained code obfuscation. Combining these with adversarial training (AT) enhances detector robustness by 2.5x. We comprehensively evaluate AT effectiveness by testing our detector against 122,398 packages collected daily from PyPI over 80 days, showing that AT needs careful application: it makes the detector more robust to obfuscations and allows finding 10% more obfuscated packages, but slightly decreases performance on non-obfuscated packages. We demonstrate production adaptability of our detector via two case studies: (i) one for PyPI maintainers (tuned at 0.1% FPR) and (ii) one for enterprise teams (tuned at 10% FPR). In the former, we analyze 91,949 packages collected from PyPI over 37 days, achieving a daily detection rate of 2.48 malicious packages with only 2.18 false positives. In the latter, we analyze 1,596 packages adopted by a multinational software company, obtaining only 1.24 false positives daily. These results show that our detector can be seamlessly integrated into both public repositories like PyPI and enterprise ecosystems, ensuring a very low time budget of a few minutes to review the false positives. Overall, we uncovered 346 malicious packages, now reported to the community.
Related papers
- RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories [58.32028251925354]
Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, but their proficiency in producing secure code remains a critical, under-explored area.<n>We introduce RealSec-bench, a new benchmark for secure code generation meticulously constructed from real-world, high-risk Java repositories.
arXiv Detail & Related papers (2026-01-30T08:29:01Z) - Cutting the Gordian Knot: Detecting Malicious PyPI Packages via a Knowledge-Mining Framework [14.0015860172317]
Python Package Index (PyPI) has become a target for malicious actors.<n>Current detection tools generate false positive rates of 15-30%, incorrectly flagging one-third of legitimate packages as malicious.<n>We propose PyGuard, a knowledge-driven framework that converts detection failures into useful behavioral knowledge.
arXiv Detail & Related papers (2026-01-23T05:49:09Z) - Why Authors and Maintainers Link (or Don't Link) Their PyPI Libraries to Code Repositories and Donation Platforms [83.16077040470975]
Metadata of libraries on the Python Package Index (PyPI) plays a critical role in supporting the transparency, trust, and sustainability of open-source libraries.<n>This paper presents a large-scale empirical study combining two targeted surveys sent to 50,000 PyPI authors and maintainers.<n>We analyze more than 1,400 responses using large language model (LLM)-based topic modeling to uncover key motivations and barriers related to linking repositories and donation platforms.
arXiv Detail & Related papers (2026-01-21T16:13:57Z) - An Empirical Study of Vulnerabilities in Python Packages and Their Detection [12.629138654621983]
This paper introduces PyVul, the first comprehensive benchmark suite of Python-package vulnerabilities.<n>PyVul includes 1,157 publicly reported, developer-verified vulnerabilities, each linked to its affected packages.<n>An LLM-assisted data cleansing method is incorporated to improve label accuracy, achieving 100% commit-level and 94% function-level accuracy.
arXiv Detail & Related papers (2025-09-04T14:38:28Z) - DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors [52.85182605005619]
We introduce DyePack, a framework that leverages backdoor attacks to identify models that used benchmark test sets during training.<n>Like how banks mix dye packs with their money to mark robbers, DyePack mixes backdoor samples with the test data to flag models that trained on it.<n>We evaluate DyePack on five models across three datasets, covering both multiple-choice and open-ended generation tasks.
arXiv Detail & Related papers (2025-05-29T02:22:14Z) - DySec: A Machine Learning-based Dynamic Analysis for Detecting Malicious Packages in PyPI Ecosystem [4.045165357831481]
Malicious Python packages make software supply chains vulnerable by exploiting trust in open-source repositories like Python Package Index (PyPI)<n>Lack of real-time behavioral monitoring makes metadata inspection and static code analysis inadequate against advanced attack strategies.<n>We introduce DySec, a machine learning-based dynamic analysis framework for PyPI that uses eBPF kernel and user-level probes to monitor behaviors during package installation.
arXiv Detail & Related papers (2025-03-01T03:20:42Z) - PyPulse: A Python Library for Biosignal Imputation [58.35269251730328]
We introduce PyPulse, a Python package for imputation of biosignals in both clinical and wearable sensor settings.<n>PyPulse's framework provides a modular and extendable framework with high ease-of-use for a broad userbase, including non-machine-learning bioresearchers.<n>We released PyPulse under the MIT License on Github and PyPI.
arXiv Detail & Related papers (2024-12-09T11:00:55Z) - A Machine Learning-Based Approach For Detecting Malicious PyPI Packages [4.311626046942916]
In modern software development, the use of external libraries and packages is increasingly prevalent.<n>This reliance on reusing code introduces serious risks for deployed software in the form of malicious packages.<n>We propose a data-driven approach that uses machine learning and static analysis to examine the package's metadata, code, files, and textual characteristics.
arXiv Detail & Related papers (2024-12-06T18:49:06Z) - Towards Robust Detection of Open Source Software Supply Chain Poisoning Attacks in Industry Environments [9.29518367616395]
We present OSCAR, a dynamic code poisoning detection pipeline for NPM and PyPI ecosystems.
OSCAR fully executes packages in a sandbox environment, employs fuzz testing on exported functions and classes, and implements aspect-based behavior monitoring.
We evaluate OSCAR against six existing tools using a comprehensive benchmark dataset of real-world malicious and benign packages.
arXiv Detail & Related papers (2024-09-14T08:01:43Z) - Zero-Shot Detection of Machine-Generated Codes [83.0342513054389]
This work proposes a training-free approach for the detection of LLMs-generated codes.
We find that existing training-based or zero-shot text detectors are ineffective in detecting code.
Our method exhibits robustness against revision attacks and generalizes well to Java codes.
arXiv Detail & Related papers (2023-10-08T10:08:21Z) - Killing Two Birds with One Stone: Malicious Package Detection in NPM and PyPI using a Single Model of Malicious Behavior Sequence [8.58275522939837]
Package registries NPM and PyPI have been flooded with malicious packages.<n>The effectiveness of existing malicious NPM and PyPI package detection approaches is hindered by two challenges.<n>We propose and implement Cerebro to detect malicious packages in NPM and PyPI.
arXiv Detail & Related papers (2023-09-06T00:58:59Z) - Segment and Complete: Defending Object Detectors against Adversarial
Patch Attacks with Robust Patch Detection [142.24869736769432]
Adversarial patch attacks pose a serious threat to state-of-the-art object detectors.
We propose Segment and Complete defense (SAC), a framework for defending object detectors against patch attacks.
We show SAC can significantly reduce the targeted attack success rate of physical patch attacks.
arXiv Detail & Related papers (2021-12-08T19:18:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.