Related papers: Malware families discovery via Open-Set Recognition on Android manifest permissions

Malware families discovery via Open-Set Recognition on Android manifest permissions

URL: http://arxiv.org/abs/2505.12750v1
Date: Mon, 19 May 2025 06:19:54 GMT
Title: Malware families discovery via Open-Set Recognition on Android manifest permissions
Authors: Filippo Leveni, Matteo Mistura, Francesco Iubatti, Carmine Giangregorio, Nicolò Pastore, Cesare Alippi, Giacomo Boracchi,
Abstract summary: Classifying malware programs into their respective families is essential for building effective defenses against cyber threats.<n>We present a malware classification system that, on top of classifying known malware, detects new ones.<n>Our solution turns out to be very practical, as it can be seamlessly employed in a standard classification workflow.
Score: 15.838751258859004
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Malware are malicious programs that are grouped into families based on their penetration technique, source code, and other characteristics. Classifying malware programs into their respective families is essential for building effective defenses against cyber threats. Machine learning models have a huge potential in malware detection on mobile devices, as malware families can be recognized by classifying permission data extracted from Android manifest files. Still, the malware classification task is challenging due to the high-dimensional nature of permission data and the limited availability of training samples. In particular, the steady emergence of new malware families makes it impossible to acquire a comprehensive training set covering all the malware classes. In this work, we present a malware classification system that, on top of classifying known malware, detects new ones. In particular, we combine an open-set recognition technique developed within the computer vision community, namely MaxLogit, with a tree-based Gradient Boosting classifier, which is particularly effective in classifying high-dimensional data. Our solution turns out to be very practical, as it can be seamlessly employed in a standard classification workflow, and efficient, as it adds minimal computational overhead. Experiments on public and proprietary datasets demonstrate the potential of our solution, which has been deployed in a business environment.

Related papers

Addressing malware family concept drift with triplet autoencoder [2.416907802598482]
Concept drift can occur in two forms: the emergence of entirely new malware families and the evolution of existing ones.<n>This paper proposes an innovative method to address the former, focusing on effectively identifying new malware families.<n>Our results demonstrate a significant improvement in detecting new malware families, offering a reliable solution for ongoing cybersecurity challenges.
arXiv Detail & Related papers (2025-07-01T00:55:00Z)
MASKDROID: Robust Android Malware Detection with Masked Graph Representations [56.09270390096083]
We propose MASKDROID, a powerful detector with a strong discriminative ability to identify malware. We introduce a masking mechanism into the Graph Neural Network based framework, forcing MASKDROID to recover the whole input graph. This strategy enables the model to understand the malicious semantics and learn more stable representations, enhancing its robustness against adversarial attacks.
arXiv Detail & Related papers (2024-09-29T07:22:47Z)
MalMixer: Few-Shot Malware Classification with Retrieval-Augmented Semi-Supervised Learning [8.724680868086626]
MalMixer is a semi-supervised malware family classifier that achieves high accuracy with sparse training data.<n>We present a domain-knowledge-aware data augmentation technique for malware feature representations, enhancing few-shot performance of semi-supervised malware family classification.
arXiv Detail & Related papers (2024-09-20T04:50:49Z)
Catch'em all: Classification of Rare, Prominent, and Novel Malware Families [3.147175286021779]
Malware remains one of the most dangerous and costly cyber threats. As of last year, researchers reported 1.3 billion known malware specimens. These challenges include detection of novel malware and the ability to perform malware classification in the face of class imbalance.
arXiv Detail & Related papers (2024-03-04T23:46:19Z)
EMBERSim: A Large-Scale Databank for Boosting Similarity Search in Malware Analysis [48.5877840394508]
In recent years there has been a shift from quantifications-based malware detection towards machine learning. We propose to address the deficiencies in the space of similarity research on binary files, starting from EMBER. We enhance EMBER with similarity information as well as malware class tags, to enable further research in the similarity space.
arXiv Detail & Related papers (2023-10-03T06:58:45Z)
CNS-Net: Conservative Novelty Synthesizing Network for Malware Recognition in an Open-set Scenario [14.059646012441313]
We study the challenging task of malware recognition on both known and novel unknown malware families, called malware open-set recognition (MOSR) In this paper, we propose a novel model that can conservatively synthesize malware instances to mimic unknown malware families. We also build a new large-scale malware dataset, named MAL-100, to fill the gap of lacking large open-set malware benchmark dataset.
arXiv Detail & Related papers (2023-05-02T07:31:42Z)
A survey on hardware-based malware detection approaches [45.24207460381396]
Hardware-based malware detection approaches leverage hardware performance counters and machine learning prowess. We meticulously analyze the approach, unraveling the most common methods, algorithms, tools, and datasets that shape its contours. The discussion extends to crafting mixed hardware and software approaches for collaborative efficacy, essential enhancements in hardware monitoring units, and a better understanding of the correlation between hardware events and malware applications.
arXiv Detail & Related papers (2023-03-22T13:00:41Z)
DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection. Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables. We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z)
Evading Malware Classifiers via Monte Carlo Mutant Feature Discovery [23.294653273180472]
We show how a malicious actor trains a surrogate model to discover binary mutations that cause an instance to be misclassified. Then, mutated malware is sent to the victim model that takes the place of an antivirus API to test whether it can evade detection.
arXiv Detail & Related papers (2021-06-15T03:31:02Z)
Being Single Has Benefits. Instance Poisoning to Deceive Malware Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier. As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger. We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z)
Adversarial EXEmples: A Survey and Experimental Evaluation of Practical Attacks on Machine Learning for Windows Malware Detection [67.53296659361598]
adversarial EXEmples can bypass machine learning-based detection by perturbing relatively few input bytes. We develop a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks. These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section.
arXiv Detail & Related papers (2020-08-17T07:16:57Z)
Deep Learning and Open Set Malware Classification: A Survey [0.0]
Recent machine learning works have shed light on Open Set Recognition (OSR) problem in machine learning. OSR system should not only correctly classify the known classes, but also recognize the unknown class. This survey provides an overview of different deep learning techniques, a discussion of OSR and graph representation solutions and an introduction of malware classification systems.
arXiv Detail & Related papers (2020-04-08T21:36:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.