Related papers: Unveiling Malicious Logic: Towards a Statement-Level Taxonomy and Dataset for Securing Python Packages

Unveiling Malicious Logic: Towards a Statement-Level Taxonomy and Dataset for Securing Python Packages

URL: http://arxiv.org/abs/2512.12559v1
Date: Sun, 14 Dec 2025 05:28:30 GMT
Title: Unveiling Malicious Logic: Towards a Statement-Level Taxonomy and Dataset for Securing Python Packages
Authors: Ahmed Ryan, Junaid Mansur Ifti, Md Erfan, Akond Ashfaque Ur Rahman, Md Rayhanur Rahman,
Abstract summary: Existing datasets label packages as malicious or benign at the package level, but do not specify which statements implement malicious behavior.<n>We construct a statement-level dataset of 370 malicious Python packages with 2,962 labeled occurrences of malicious indicators.<n>We derive a fine-grained taxonomy of 47 malicious indicators across 7 types that capture how adversarial behavior is implemented in code.
Score: 0.19029675742486804
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The widespread adoption of open-source ecosystems enables developers to integrate third-party packages, but also exposes them to malicious packages crafted to execute harmful behavior via public repositories such as PyPI. Existing datasets (e.g., pypi-malregistry, DataDog, OpenSSF, MalwareBench) label packages as malicious or benign at the package level, but do not specify which statements implement malicious behavior. This coarse granularity limits research and practice: models cannot be trained to localize malicious code, detectors cannot justify alerts with code-level evidence, and analysts cannot systematically study recurring malicious indicators or attack chains. To address this gap, we construct a statement-level dataset of 370 malicious Python packages (833 files, 90,527 lines) with 2,962 labeled occurrences of malicious indicators. From these annotations, we derive a fine-grained taxonomy of 47 malicious indicators across 7 types that capture how adversarial behavior is implemented in code, and we apply sequential pattern mining to uncover recurring indicator sequences that characterize common attack workflows. Our contribution enables explainable, behavior-centric detection and supports both semantic-aware model training and practical heuristics for strengthening software supply-chain defenses.

Related papers

Mind the Gap: Evaluating LLMs for High-Level Malicious Package Detection vs. Fine-Grained Indicator Identification [1.1103813686369686]
Large Language Models (LLMs) have emerged as a promising tool for automated security tasks.<n>This paper presents a systematic evaluation of 13 LLMs for detecting malicious software packages.
arXiv Detail & Related papers (2026-02-18T09:36:46Z)
When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents [90.05202259420138]
Unintended computer-use agents can deviate from expected outcomes even under benign input contexts.<n>We introduce the first conceptual and methodological framework for unintended CUA behaviors.<n>We propose AutoElicit: an agentic framework that iteratively perturbs benign instructions using CUA execution feedback.
arXiv Detail & Related papers (2026-02-09T03:20:11Z)
Cutting the Gordian Knot: Detecting Malicious PyPI Packages via a Knowledge-Mining Framework [14.0015860172317]
Python Package Index (PyPI) has become a target for malicious actors.<n>Current detection tools generate false positive rates of 15-30%, incorrectly flagging one-third of legitimate packages as malicious.<n>We propose PyGuard, a knowledge-driven framework that converts detection failures into useful behavioral knowledge.
arXiv Detail & Related papers (2026-01-23T05:49:09Z)
Bridging Expert Reasoning and LLM Detection: A Knowledge-Driven Framework for Malicious Packages [10.858565849895314]
Open-source ecosystems such as NPM and PyPI are increasingly targeted by supply chain attacks.<n>We present IntelGuard, a retrieval-augmented generation (RAG) based framework that integrates expert analytical reasoning into automated malicious package detection.
arXiv Detail & Related papers (2026-01-23T05:31:12Z)
Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain [82.98626829232899]
Fine-tuning AI agents on data from their own interactions introduces a critical security vulnerability within the AI supply chain.<n>We show that adversaries can easily poison the data collection pipeline to embed hard-to-detect backdoors.
arXiv Detail & Related papers (2025-10-03T12:47:21Z)
Defending against Indirect Prompt Injection by Instruction Detection [109.30156975159561]
InstructDetector is a novel detection-based approach that leverages the behavioral states of LLMs to identify potential IPI attacks.<n>InstructDetector achieves a detection accuracy of 99.60% in the in-domain setting and 96.90% in the out-of-domain setting, and reduces the attack success rate to just 0.03% on the BIPIA benchmark.
arXiv Detail & Related papers (2025-05-08T13:04:45Z)
ConfuGuard: Using Metadata to Detect Active and Stealthy Package Confusion Attacks Accurately and at Scale [3.259700715934023]
We introduce ConfuGuard, a state-of-art detector for package confusion threats.<n>We present the first empirical analysis of benign signals derived from prior package confusion data.<n>We leverage package metadata to distinguish benign packages, and extend support from three up to seven software package registries.
arXiv Detail & Related papers (2025-02-27T21:25:10Z)
A Machine Learning-Based Approach For Detecting Malicious PyPI Packages [4.311626046942916]
In modern software development, the use of external libraries and packages is increasingly prevalent.<n>This reliance on reusing code introduces serious risks for deployed software in the form of malicious packages.<n>We propose a data-driven approach that uses machine learning and static analysis to examine the package's metadata, code, files, and textual characteristics.
arXiv Detail & Related papers (2024-12-06T18:49:06Z)
An Empirical Study of Malicious Code In PyPI Ecosystem [15.739368369031277]
PyPI provides a convenient and accessible package management platform to developers. The rapid development of the PyPI ecosystem has led to a severe problem of malicious package propagation. We conduct an empirical study to understand the characteristics and current state of the malicious code lifecycle in the PyPI ecosystem.
arXiv Detail & Related papers (2023-09-20T02:51:02Z)
Killing Two Birds with One Stone: Malicious Package Detection in NPM and PyPI using a Single Model of Malicious Behavior Sequence [8.58275522939837]
Package registries NPM and PyPI have been flooded with malicious packages.<n>The effectiveness of existing malicious NPM and PyPI package detection approaches is hindered by two challenges.<n>We propose and implement Cerebro to detect malicious packages in NPM and PyPI.
arXiv Detail & Related papers (2023-09-06T00:58:59Z)
On the Exploitability of Instruction Tuning [103.8077787502381]
In this work, we investigate how an adversary can exploit instruction tuning to change a model's behavior. We propose textitAutoPoison, an automated data poisoning pipeline. Our results show that AutoPoison allows an adversary to change a model's behavior by poisoning only a small fraction of data.
arXiv Detail & Related papers (2023-06-28T17:54:04Z)
Robust Deep Semi-Supervised Learning: A Brief Introduction [63.09703308309176]
Semi-supervised learning (SSL) aims to improve learning performance by leveraging unlabeled data when labels are insufficient. SSL with deep models has proven to be successful on standard benchmark tasks. However, they are still vulnerable to various robustness threats in real-world applications.
arXiv Detail & Related papers (2022-02-12T04:16:41Z)
Tracking the risk of a deployed model and detecting harmful distribution shifts [105.27463615756733]
In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially. We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate.
arXiv Detail & Related papers (2021-10-12T17:21:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.