Related papers: ConfuGuard: Using Metadata to Detect Active and Stealthy Package Confusion Attacks Accurately and at Scale

ConfuGuard: Using Metadata to Detect Active and Stealthy Package Confusion Attacks Accurately and at Scale

URL: http://arxiv.org/abs/2502.20528v2
Date: Mon, 17 Mar 2025 21:57:16 GMT
Title: ConfuGuard: Using Metadata to Detect Active and Stealthy Package Confusion Attacks Accurately and at Scale
Authors: Wenxin Jiang, Berk Çakar, Mikola Lysenko, James C. Davis,
Abstract summary: ConfuGuard is a solution designed to address the challenges posed by package confusion threats.<n>We present the first empirical analysis of benign signals derived from prior package confusion data.<n>We extend support from three to six software package registries, and leverage package metadata to distinguish benign packages.
Score: 3.259700715934023
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Package confusion attacks such as typosquatting threaten software supply chains. Attackers make packages with names that syntactically or semantically resemble legitimate ones, tricking engineers into installing malware. While prior work has developed defenses against package confusions in some software package registries, notably NPM, PyPI, and RubyGems, gaps remain: high false-positive rates; generalization to more software package ecosystems; and insights from real-world deployment. In this work, we introduce ConfuGuard, a solution designed to address the challenges posed by package confusion threats. We begin by presenting the first empirical analysis of benign signals derived from prior package confusion data, uncovering their threat patterns, engineering practices, and measurable attributes. We observed that 13.3% of real package confusion attacks are initially stealthy, so we take that into consideration and refined the definitions. Building on state-of-the-art approaches, we extend support from three to six software package registries, and leverage package metadata to distinguish benign packages. Our approach significantly reduces 64% false-positive (from 77% to 13%), with acceptable additional overhead to filter out benign packages by analyzing the package metadata. ConfuGuard is in production at our industry partner, whose analysts have already confirmed 301 packages detected by ConfuGuard as real attacks. We share lessons learned from production and provide insights to researchers.

Related papers

Decompiling Smart Contracts with a Large Language Model [51.49197239479266]
Despite Etherscan's 78,047,845 smart contracts deployed on (as of May 26, 2025), a mere 767,520 ( 1%) are open source.<n>This opacity necessitates the automated semantic analysis of on-chain smart contract bytecode.<n>We introduce a pioneering decompilation pipeline that transforms bytecode into human-readable and semantically faithful Solidity code.
arXiv Detail & Related papers (2025-06-24T13:42:59Z)
A Machine Learning-Based Approach For Detecting Malicious PyPI Packages [4.311626046942916]
In modern software development, the use of external libraries and packages is increasingly prevalent.<n>This reliance on reusing code introduces serious risks for deployed software in the form of malicious packages.<n>We propose a data-driven approach that uses machine learning and static analysis to examine the package's metadata, code, files, and textual characteristics.
arXiv Detail & Related papers (2024-12-06T18:49:06Z)
The Impact of SBOM Generators on Vulnerability Assessment in Python: A Comparison and a Novel Approach [56.4040698609393]
Software Bill of Materials (SBOM) has been promoted as a tool to increase transparency and verifiability in software composition. Current SBOM generation tools often suffer from inaccuracies in identifying components and dependencies. We propose PIP-sbom, a novel pip-inspired solution that addresses their shortcomings.
arXiv Detail & Related papers (2024-09-10T10:12:37Z)
h4rm3l: A Dynamic Benchmark of Composable Jailbreak Attacks for LLM Safety Assessment [48.5611060845958]
We propose a novel benchmark of composable jailbreak attacks to move beyond static datasets and of attacks and harms. We use h4rm3l to generate a dataset of 2656 successful novel jailbreak attacks targeting 6 state-of-the-art (SOTA) open-source and proprietary LLMs. Several of our synthesized attacks are more effective than previously reported ones, with Attack Success rates exceeding 90% on SOTA closed language models.
arXiv Detail & Related papers (2024-08-09T01:45:39Z)
GoSurf: Identifying Software Supply Chain Attack Vectors in Go [9.91891839872381]
We propose a novel taxonomy of 12 distinct attack vectors tailored for the Go language and its package lifecycle. Our work provides preliminary insights for securing the open-source software supply chain within the Go ecosystem.
arXiv Detail & Related papers (2024-07-05T11:52:27Z)
EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection [53.25863925815954]
Federated self-supervised learning (FSSL) has emerged as a promising paradigm that enables the exploitation of clients' vast amounts of unlabeled data. While FSSL offers advantages, its susceptibility to backdoor attacks has not been investigated. We propose the Embedding Inspector (EmInspector) that detects malicious clients by inspecting the embedding space of local models.
arXiv Detail & Related papers (2024-05-21T06:14:49Z)
Understanding crypter-as-a-service in a popular underground marketplace [51.328567400947435]
Crypters are pieces of software whose main goal is to transform a target binary so it can avoid detection from Anti Viruses (AVs) applications. The crypter-as-a-service model has gained popularity, in response to the increased sophistication of detection mechanisms. This paper provides the first study on an online underground market dedicated to crypter-as-a-service.
arXiv Detail & Related papers (2024-05-20T08:35:39Z)
A Large-scale Fine-grained Analysis of Packages in Open-Source Software Ecosystems [13.610690659041417]
Malicious packages have less metadata content and utilize fewer static and dynamic functions than legitimate ones. One dimension in fine-grained information (FGI) has sufficient distinguishable capability to detect malicious packages.
arXiv Detail & Related papers (2024-04-17T15:16:01Z)
DONAPI: Malicious NPM Packages Detector using Behavior Sequence Knowledge Mapping [28.852274185512236]
npm is the most extensive package manager, hosting more than 2 million third-party open-source packages. In this paper, we synchronize a local package cache containing more than 3.4 million packages in near real-time to give us access to more package code details. We propose the DONAPI, an automatic malicious npm packages detector that combines static and dynamic analysis.
arXiv Detail & Related papers (2024-03-13T08:38:21Z)
Malicious Package Detection using Metadata Information [0.272760415353533]
We introduce a metadata-based malicious package detection model, MeMPtec. MeMPtec extracts a set of features from package metadata information. Our experiments indicate a significant reduction in both false positives and false negatives.
arXiv Detail & Related papers (2024-02-12T06:54:57Z)
Malicious Package Detection in NPM and PyPI using a Single Model of Malicious Behavior Sequence [7.991922551051611]
Package registries NPM and PyPI have been flooded with malicious packages. The effectiveness of existing malicious NPM and PyPI package detection approaches is hindered by two challenges. We propose and implement Cerebro to detect malicious packages in NPM and PyPI.
arXiv Detail & Related papers (2023-09-06T00:58:59Z)
VulLibGen: Generating Names of Vulnerability-Affected Packages via a Large Language Model [13.96251273677855]
VulLibGen is a method to directly generate affected packages. It has an average accuracy of 0.806 for identifying vulnerable packages. We have submitted 60 vulnerability, affected package> pairs to GitHub Advisory.
arXiv Detail & Related papers (2023-08-09T02:02:46Z)
A LLM Assisted Exploitation of AI-Guardian [57.572998144258705]
We evaluate the robustness of AI-Guardian, a recent defense to adversarial examples published at IEEE S&P 2023. We write none of the code to attack this model, and instead prompt GPT-4 to implement all attack algorithms following our instructions and guidance. This process was surprisingly effective and efficient, with the language model at times producing code from ambiguous instructions faster than the author of this paper could have done.
arXiv Detail & Related papers (2023-07-20T17:33:25Z)
On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository. We retrieve over 53k potential vulnerable clones from Maven Central. We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z)
DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection. Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables. We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z)
Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection [142.24869736769432]
Adversarial patch attacks pose a serious threat to state-of-the-art object detectors. We propose Segment and Complete defense (SAC), a framework for defending object detectors against patch attacks. We show SAC can significantly reduce the targeted attack success rate of physical patch attacks.
arXiv Detail & Related papers (2021-12-08T19:18:48Z)
Being Single Has Benefits. Instance Poisoning to Deceive Malware Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier. As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger. We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z)
Adversarial EXEmples: A Survey and Experimental Evaluation of Practical Attacks on Machine Learning for Windows Malware Detection [67.53296659361598]
adversarial EXEmples can bypass machine learning-based detection by perturbing relatively few input bytes. We develop a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks. These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section.
arXiv Detail & Related papers (2020-08-17T07:16:57Z)
Robust Encodings: A Framework for Combating Adversarial Typos [85.70270979772388]
NLP systems are easily fooled by small perturbations of inputs. Existing procedures to defend against such perturbations provide guaranteed robustness to worst-case attacks. We introduce robust encodings (RobEn) that confer guaranteed robustness without making compromises on model architecture.
arXiv Detail & Related papers (2020-05-04T01:28:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.