OSS Malicious Package Analysis in the Wild
- URL: http://arxiv.org/abs/2404.04991v2
- Date: Sun, 21 Apr 2024 04:44:17 GMT
- Title: OSS Malicious Package Analysis in the Wild
- Authors: Xiaoyan Zhou, Ying Zhang, Wenjia Niu, Jiqiang Liu, Haining Wang, Qiang Li,
- Abstract summary: This paper builds and curates the largest dataset of 23,425 malicious packages from scattered online sources.
We then propose a knowledge graph to represent the OSS malware corpus and conduct malicious package analysis in the wild.
- Score: 17.028240712650486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The open-source software (OSS) ecosystem suffers from various security threats and risks, and malicious packages play a central role in software supply chain (SSC) attacks. Although malware research has a history of over thirty years, less attention has been paid to OSS malware. Its existing research has three limitations: a lack of high-quality datasets, malware diversity, and attack campaign context. In this paper, we first build and curate the largest dataset of 23,425 malicious packages from scattered online sources. We then propose a knowledge graph to represent the OSS malware corpus and conduct malicious package analysis in the wild. Our main findings include (1) it is essential to collect malicious packages from various online sources because there is little data overlap between different sources; (2) despite the sheer volume of SSC attack campaigns, many malicious packages are similar, and unknown/sophisticated attack behaviors have yet to emerge or be detected; (3) OSS malicious package has its distinct life cycle, denoted as {changing->release->detection->removal}, and slightly changing the package (different name) is a widespread attack manner; (4) while malicious packages often lack context about how and who released them, security reports disclose the information about corresponding SSC attack campaigns.
Related papers
- GoSurf: Identifying Software Supply Chain Attack Vectors in Go [9.91891839872381]
We propose a novel taxonomy of 12 distinct attack vectors tailored for the Go language and its package lifecycle.
Our work provides preliminary insights for securing the open-source software supply chain within the Go ecosystem.
arXiv Detail & Related papers (2024-07-05T11:52:27Z) - Understanding crypter-as-a-service in a popular underground marketplace [51.328567400947435]
Crypters are pieces of software whose main goal is to transform a target binary so it can avoid detection from Anti Viruses (AVs) applications.
The crypter-as-a-service model has gained popularity, in response to the increased sophistication of detection mechanisms.
This paper provides the first study on an online underground market dedicated to crypter-as-a-service.
arXiv Detail & Related papers (2024-05-20T08:35:39Z) - The Code the World Depends On: A First Look at Technology Makers' Open Source Software Dependencies [3.6840775431698893]
Open-source software (OSS) supply chain security has become a topic of concern for organizations.
Patching an OSS vulnerability can require updating other dependent software products in addition to the original package.
We do not know what packages are most critical to patch, hindering efforts to improve OSS security where it is most needed.
arXiv Detail & Related papers (2024-04-17T21:44:38Z) - A Large-scale Fine-grained Analysis of Packages in Open-Source Software Ecosystems [13.610690659041417]
Malicious packages have less metadata content and utilize fewer static and dynamic functions than legitimate ones.
One dimension in fine-grained information (FGI) has sufficient distinguishable capability to detect malicious packages.
arXiv Detail & Related papers (2024-04-17T15:16:01Z) - How to Craft Backdoors with Unlabeled Data Alone? [54.47006163160948]
Self-supervised learning (SSL) can learn rich features in an economical and scalable way.
If the released dataset is maliciously poisoned, backdoored SSL models can behave badly when triggers are injected to test samples.
We propose two strategies for poison selection: clustering-based selection using pseudolabels, and contrastive selection derived from the mutual information principle.
arXiv Detail & Related papers (2024-04-10T02:54:18Z) - DONAPI: Malicious NPM Packages Detector using Behavior Sequence Knowledge Mapping [28.852274185512236]
npm is the most extensive package manager, hosting more than 2 million third-party open-source packages.
In this paper, we synchronize a local package cache containing more than 3.4 million packages in near real-time to give us access to more package code details.
We propose the DONAPI, an automatic malicious npm packages detector that combines static and dynamic analysis.
arXiv Detail & Related papers (2024-03-13T08:38:21Z) - Malicious Package Detection using Metadata Information [0.272760415353533]
We introduce a metadata-based malicious package detection model, MeMPtec.
MeMPtec extracts a set of features from package metadata information.
Our experiments indicate a significant reduction in both false positives and false negatives.
arXiv Detail & Related papers (2024-02-12T06:54:57Z) - EMBERSim: A Large-Scale Databank for Boosting Similarity Search in
Malware Analysis [48.5877840394508]
In recent years there has been a shift from quantifications-based malware detection towards machine learning.
We propose to address the deficiencies in the space of similarity research on binary files, starting from EMBER.
We enhance EMBER with similarity information as well as malware class tags, to enable further research in the similarity space.
arXiv Detail & Related papers (2023-10-03T06:58:45Z) - Graph Mining for Cybersecurity: A Survey [61.505995908021525]
The explosive growth of cyber attacks nowadays, such as malware, spam, and intrusions, caused severe consequences on society.
Traditional Machine Learning (ML) based methods are extensively used in detecting cyber threats, but they hardly model the correlations between real-world cyber entities.
With the proliferation of graph mining techniques, many researchers investigated these techniques for capturing correlations between cyber entities and achieving high performance.
arXiv Detail & Related papers (2023-04-02T08:43:03Z) - A System for Automated Open-Source Threat Intelligence Gathering and
Management [53.65687495231605]
SecurityKG is a system for automated OSCTI gathering and management.
It uses a combination of AI and NLP techniques to extract high-fidelity knowledge about threat behaviors.
arXiv Detail & Related papers (2021-01-19T18:31:35Z) - Being Single Has Benefits. Instance Poisoning to Deceive Malware
Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier.
As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger.
We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.