Related papers: High-resolution Image-based Malware Classification using Multiple Instance Learning

High-resolution Image-based Malware Classification using Multiple Instance Learning

URL: http://arxiv.org/abs/2311.12760v1
Date: Tue, 21 Nov 2023 18:11:26 GMT
Title: High-resolution Image-based Malware Classification using Multiple Instance Learning
Authors: Tim Peters, Hikmat Farhat
Abstract summary: This paper proposes a novel method of classifying malware into families using high-resolution greyscale images and multiple instance learning. The implementation is evaluated on the Microsoft Malware Classification dataset and achieves accuracies of up to $96.6%$ on adversarially enlarged samples.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper proposes a novel method of classifying malware into families using high-resolution greyscale images and multiple instance learning to overcome adversarial binary enlargement. Current methods of visualisation-based malware classification largely rely on lossy transformations of inputs such as resizing to handle the large, variable-sized images. Through empirical analysis and experimentation, it is shown that these approaches cause crucial information loss that can be exploited. The proposed solution divides the images into patches and uses embedding-based multiple instance learning with a convolutional neural network and an attention aggregation function for classification. The implementation is evaluated on the Microsoft Malware Classification dataset and achieves accuracies of up to $96.6\%$ on adversarially enlarged samples compared to the baseline of $22.8\%$. The Python code is available online at https://github.com/timppeters/MIL-Malware-Images .

Related papers

Efficient Curation of Invertebrate Image Datasets Using Feature Embeddings and Automatic Size Comparison [5.480305055542485]
We present a method for curating large-scale image datasets of invertebrates. Our approach is based on extracting feature embeddings with pretrained deep neural networks. Also, we show that a simple area-based size comparison approach is able to find a lot of common erroneous images.
arXiv Detail & Related papers (2024-12-20T12:35:41Z)
Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models. In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques. We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z)
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition. Our method uses the attention mechanism to correlate multiple images within a batch. Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z)
Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge. We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem. Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z)
AdvMask: A Sparse Adversarial Attack Based Data Augmentation Method for Image Classification [8.926478245654703]
We propose a novel data augmentation method, AdvMask, for image classification tasks. Instead of randomly removing areas in the images, AdvMask obtains the key points that have the greatest influence on the classification results. The experimental results on various datasets and CNN models verify that the proposed method outperforms other data augmentation methods in image classification tasks.
arXiv Detail & Related papers (2022-11-29T09:25:53Z)
Facing the Void: Overcoming Missing Data in Multi-View Imagery [0.783788180051711]
We propose a novel technique for multi-view image classification robust to this problem. The proposed method, based on state-of-the-art deep learning-based approaches and metric learning, can be easily adapted and exploited in other applications and domains. Results show that the proposed algorithm provides improvements in multi-view image classification accuracy when compared to state-of-the-art methods.
arXiv Detail & Related papers (2022-05-21T13:21:27Z)
Deep Image Deblurring: A Survey [165.32391279761006]
Deblurring is a classic problem in low-level computer vision, which aims to recover a sharp image from a blurred input image. Recent advances in deep learning have led to significant progress in solving this problem.
arXiv Detail & Related papers (2022-01-26T01:31:30Z)
Virus-MNIST: Machine Learning Baseline Calculations for Image Classification [0.0]
The Virus-MNIST data set is a collection of thumbnail images that is similar in style to the ubiquitous MNIST hand-written digits. It is poised to take on a role in benchmarking progress of virus model training.
arXiv Detail & Related papers (2021-11-03T17:44:23Z)
Malware Detection Using Frequency Domain-Based Image Visualization and Deep Learning [16.224649756613655]
We propose a novel method to detect and visualize malware through image classification. The executable binaries are represented as grayscale images obtained from the count of N-grams (N=2) of bytes in the Discrete Cosine Transform domain. A shallow neural network is trained for classification, and its accuracy is compared with deep-network architectures such as ResNet that are trained using transfer learning.
arXiv Detail & Related papers (2021-01-26T06:07:46Z)
Generative Adversarial U-Net for Domain-free Medical Image Augmentation [49.72048151146307]
The shortage of annotated medical images is one of the biggest challenges in the field of medical image computing. In this paper, we develop a novel generative method named generative adversarial U-Net. Our newly designed model is domain-free and generalizable to various medical images.
arXiv Detail & Related papers (2021-01-12T23:02:26Z)
Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning. Current contrastive models are ineffective at localizing the foreground object. We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z)
Learning Transformation-Aware Embeddings for Image Forensics [15.484408315588569]
Image Provenance Analysis aims at discovering relationships among different manipulated image versions that share content. One of the main sub-problems for provenance analysis that has not yet been addressed directly is the edit ordering of images that share full content or are near-duplicates. This paper introduces a novel deep learning-based approach to provide a plausible ordering to images that have been generated from a single image through transformations.
arXiv Detail & Related papers (2020-01-13T22:01:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.