Related papers: OSS Malicious Package Analysis in the Wild

OSS Malicious Package Analysis in the Wild

URL: http://arxiv.org/abs/2404.04991v2
Date: Sun, 21 Apr 2024 04:44:17 GMT
Title: OSS Malicious Package Analysis in the Wild
Authors: Xiaoyan Zhou, Ying Zhang, Wenjia Niu, Jiqiang Liu, Haining Wang, Qiang Li,
Abstract summary: This paper builds and curates the largest dataset of 23,425 malicious packages from scattered online sources. We then propose a knowledge graph to represent the OSS malware corpus and conduct malicious package analysis in the wild.
Score: 17.028240712650486
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The open-source software (OSS) ecosystem suffers from various security threats and risks, and malicious packages play a central role in software supply chain (SSC) attacks. Although malware research has a history of over thirty years, less attention has been paid to OSS malware. Its existing research has three limitations: a lack of high-quality datasets, malware diversity, and attack campaign context. In this paper, we first build and curate the largest dataset of 23,425 malicious packages from scattered online sources. We then propose a knowledge graph to represent the OSS malware corpus and conduct malicious package analysis in the wild. Our main findings include (1) it is essential to collect malicious packages from various online sources because there is little data overlap between different sources; (2) despite the sheer volume of SSC attack campaigns, many malicious packages are similar, and unknown/sophisticated attack behaviors have yet to emerge or be detected; (3) OSS malicious package has its distinct life cycle, denoted as {changing->release->detection->removal}, and slightly changing the package (different name) is a widespread attack manner; (4) while malicious packages often lack context about how and who released them, security reports disclose the information about corresponding SSC attack campaigns.

Related papers

LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges [70.85114705489222]
We propose MalwareBench, a benchmark dataset containing 3,520 jailbreaking prompts for malicious code-generation.<n>M MalwareBench is based on 320 manually crafted malicious code generation requirements, covering 11 jailbreak methods and 29 code functionality categories.<n>Experiments show that mainstream LLMs exhibit limited ability to reject malicious code-generation requirements, and the combination of multiple jailbreak methods further reduces the model's security capabilities.
arXiv Detail & Related papers (2025-06-09T12:02:39Z)
S3C2 Summit 2024-09: Industry Secure Software Supply Chain Summit [50.93790634176803]
Over the past several years, there has been an exponential increase in cyberattacks targeting software supply chains.<n>The ever-evolving threat of software supply chain attacks has garnered interest from the software industry and the US government.<n>Three researchers from the NSF-backed Secure Software Supply Chain Center (S3C2) conducted a Secure Software Supply Chain Summit with a diverse set of 12 practitioners from 9 companies.
arXiv Detail & Related papers (2025-05-15T17:48:14Z)
A Study of Malware Prevention in Linux Distributions [7.166753790047296]
Malicious attacks on open source software packages are a growing concern. This study explores the challenges of preventing and detecting malware in Linux distribution package repositories.
arXiv Detail & Related papers (2024-11-17T09:42:08Z)
Discovery of Timeline and Crowd Reaction of Software Vulnerability Disclosures [47.435076500269545]
Apache Log4J was found to be vulnerable to remote code execution attacks. More than 35,000 packages were forced to update their Log4J libraries with the latest version. It is practically reasonable for software developers to update their third-party libraries whenever the software vendors have released a vulnerable-free version.
arXiv Detail & Related papers (2024-11-12T01:55:51Z)
The Impact of SBOM Generators on Vulnerability Assessment in Python: A Comparison and a Novel Approach [56.4040698609393]
Software Bill of Materials (SBOM) has been promoted as a tool to increase transparency and verifiability in software composition. Current SBOM generation tools often suffer from inaccuracies in identifying components and dependencies. We propose PIP-sbom, a novel pip-inspired solution that addresses their shortcomings.
arXiv Detail & Related papers (2024-09-10T10:12:37Z)
GoSurf: Identifying Software Supply Chain Attack Vectors in Go [9.91891839872381]
We propose a novel taxonomy of 12 distinct attack vectors tailored for the Go language and its package lifecycle. Our work provides preliminary insights for securing the open-source software supply chain within the Go ecosystem.
arXiv Detail & Related papers (2024-07-05T11:52:27Z)
Understanding crypter-as-a-service in a popular underground marketplace [51.328567400947435]
Crypters are pieces of software whose main goal is to transform a target binary so it can avoid detection from Anti Viruses (AVs) applications. The crypter-as-a-service model has gained popularity, in response to the increased sophistication of detection mechanisms. This paper provides the first study on an online underground market dedicated to crypter-as-a-service.
arXiv Detail & Related papers (2024-05-20T08:35:39Z)
The Code the World Depends On: A First Look at Technology Makers' Open Source Software Dependencies [3.6840775431698893]
Open-source software (OSS) supply chain security has become a topic of concern for organizations. Patching an OSS vulnerability can require updating other dependent software products in addition to the original package. We do not know what packages are most critical to patch, hindering efforts to improve OSS security where it is most needed.
arXiv Detail & Related papers (2024-04-17T21:44:38Z)
A Large-scale Fine-grained Analysis of Packages in Open-Source Software Ecosystems [13.610690659041417]
Malicious packages have less metadata content and utilize fewer static and dynamic functions than legitimate ones. One dimension in fine-grained information (FGI) has sufficient distinguishable capability to detect malicious packages.
arXiv Detail & Related papers (2024-04-17T15:16:01Z)
How to Craft Backdoors with Unlabeled Data Alone? [54.47006163160948]
Self-supervised learning (SSL) can learn rich features in an economical and scalable way. If the released dataset is maliciously poisoned, backdoored SSL models can behave badly when triggers are injected to test samples. We propose two strategies for poison selection: clustering-based selection using pseudolabels, and contrastive selection derived from the mutual information principle.
arXiv Detail & Related papers (2024-04-10T02:54:18Z)
DONAPI: Malicious NPM Packages Detector using Behavior Sequence Knowledge Mapping [28.852274185512236]
npm is the most extensive package manager, hosting more than 2 million third-party open-source packages. In this paper, we synchronize a local package cache containing more than 3.4 million packages in near real-time to give us access to more package code details. We propose the DONAPI, an automatic malicious npm packages detector that combines static and dynamic analysis.
arXiv Detail & Related papers (2024-03-13T08:38:21Z)
Malicious Package Detection using Metadata Information [0.272760415353533]
We introduce a metadata-based malicious package detection model, MeMPtec. MeMPtec extracts a set of features from package metadata information. Our experiments indicate a significant reduction in both false positives and false negatives.
arXiv Detail & Related papers (2024-02-12T06:54:57Z)
MalDICT: Benchmark Datasets on Malware Behaviors, Platforms, Exploitation, and Packers [44.700094741798445]
Existing research on malware classification focuses almost exclusively on two tasks: distinguishing between malicious and benign files and classifying malware by family. We have identified four tasks which are under-represented in prior work: classification by behaviors that malware exhibit, platforms that malware run on, vulnerabilities that malware exploit, and packers that malware are packed with. We are releasing benchmark datasets for each of these four classification tasks, tagged using ClarAVy and comprising nearly 5.5 million malicious files in total.
arXiv Detail & Related papers (2023-10-18T04:36:26Z)
EMBERSim: A Large-Scale Databank for Boosting Similarity Search in Malware Analysis [48.5877840394508]
In recent years there has been a shift from quantifications-based malware detection towards machine learning. We propose to address the deficiencies in the space of similarity research on binary files, starting from EMBER. We enhance EMBER with similarity information as well as malware class tags, to enable further research in the similarity space.
arXiv Detail & Related papers (2023-10-03T06:58:45Z)
Graph Mining for Cybersecurity: A Survey [61.505995908021525]
The explosive growth of cyber attacks nowadays, such as malware, spam, and intrusions, caused severe consequences on society. Traditional Machine Learning (ML) based methods are extensively used in detecting cyber threats, but they hardly model the correlations between real-world cyber entities. With the proliferation of graph mining techniques, many researchers investigated these techniques for capturing correlations between cyber entities and achieving high performance.
arXiv Detail & Related papers (2023-04-02T08:43:03Z)
Adversarial Attacks against Windows PE Malware Detection: A Survey of the State-of-the-Art [44.975088044180374]
This paper focuses on malware with the file format of portable executable (PE) in the family of Windows operating systems, namely Windows PE malware. We first outline the general learning framework of Windows PE malware detection based on ML/DL. We then highlight three unique challenges of performing adversarial attacks in the context of PE malware.
arXiv Detail & Related papers (2021-12-23T02:12:43Z)
MOTIF: A Large Malware Reference Dataset with Ground Truth Family Labels [21.050311121388813]
We have created the Malware Open-source Threat Intelligence Family (MOTIF) dataset. MOTIF contains 3,095 malware samples from 454 families, making it the largest and most diverse public malware dataset. We provide aliases of the different names used to describe the same malware family, allowing us to benchmark for the first time accuracy of existing tools.
arXiv Detail & Related papers (2021-11-29T23:59:50Z)
A System for Automated Open-Source Threat Intelligence Gathering and Management [53.65687495231605]
SecurityKG is a system for automated OSCTI gathering and management. It uses a combination of AI and NLP techniques to extract high-fidelity knowledge about threat behaviors.
arXiv Detail & Related papers (2021-01-19T18:31:35Z)
Being Single Has Benefits. Instance Poisoning to Deceive Malware Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier. As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger. We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.