OpCode-Based Malware Classification Using Machine Learning and Deep Learning Techniques
- URL: http://arxiv.org/abs/2504.13408v1
- Date: Fri, 18 Apr 2025 02:09:57 GMT
- Title: OpCode-Based Malware Classification Using Machine Learning and Deep Learning Techniques
- Authors: Varij Saini, Rudraksh Gupta, Neel Soni,
- Abstract summary: This report presents a comprehensive analysis of malware classification using OpCode sequences.<n>Two distinct approaches are evaluated: traditional machine learning using n-gram analysis with Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Decision Tree classifiers; and a deep learning approach employing a Convolutional Neural Network (CNN)
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This technical report presents a comprehensive analysis of malware classification using OpCode sequences. Two distinct approaches are evaluated: traditional machine learning using n-gram analysis with Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Decision Tree classifiers; and a deep learning approach employing a Convolutional Neural Network (CNN). The traditional machine learning approach establishes a baseline using handcrafted 1-gram and 2-gram features from disassembled malware samples. The deep learning methodology builds upon the work proposed in "Deep Android Malware Detection" by McLaughlin et al. and evaluates the performance of a CNN model trained to automatically extract features from raw OpCode data. Empirical results are compared using standard performance metrics (accuracy, precision, recall, and F1-score). While the SVM classifier outperforms other traditional techniques, the CNN model demonstrates competitive performance with the added benefit of automated feature extraction.
Related papers
- Scalable APT Malware Classification via Parallel Feature Extraction and GPU-Accelerated Learning [0.3277163122167433]
This paper presents a framework for mapping malicious executables to known Persistent Advanced Threat (APT) groups.
The main feature of this analysis is the assembly-level instructions present in executables which are also known as opcodes.
Traditional and deep learning models are applied to create models capable of classifying malware samples.
arXiv Detail & Related papers (2025-04-22T00:05:05Z) - Enhancing Malware Detection by Integrating Machine Learning with Cuckoo
Sandbox [0.0]
This study aims to classify and identify malware extracted from a dataset containing API call sequences.
Both deep learning and machine learning algorithms achieve remarkably high levels of accuracy, reaching up to 99% in certain cases.
arXiv Detail & Related papers (2023-11-07T22:33:17Z) - A Natural Language Processing Approach to Malware Classification [2.707154152696381]
In this research, we consider a hybrid architecture, where Hidden Markov Models (HMM) are trained on opcode sequences.
extracting the HMM hidden state sequences can be viewed as a form of feature engineering.
We find that this NLP-based approach outperforms other popular techniques on a challenging malware dataset.
arXiv Detail & Related papers (2023-07-07T23:16:23Z) - Towards Better Out-of-Distribution Generalization of Neural Algorithmic
Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks.
The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z) - Adaptive Convolutional Dictionary Network for CT Metal Artifact
Reduction [62.691996239590125]
We propose an adaptive convolutional dictionary network (ACDNet) for metal artifact reduction.
Our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image.
Our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods.
arXiv Detail & Related papers (2022-05-16T06:49:36Z) - Comparison Analysis of Traditional Machine Learning and Deep Learning
Techniques for Data and Image Classification [62.997667081978825]
The purpose of the study is to analyse and compare the most common machine learning and deep learning techniques used for computer vision 2D object classification tasks.
Firstly, we will present the theoretical background of the Bag of Visual words model and Deep Convolutional Neural Networks (DCNN)
Secondly, we will implement a Bag of Visual Words model, the VGG16 CNN Architecture.
arXiv Detail & Related papers (2022-04-11T11:34:43Z) - Rethinking Nearest Neighbors for Visual Classification [56.00783095670361]
k-NN is a lazy learning method that aggregates the distance between the test image and top-k neighbors in a training set.
We adopt k-NN with pre-trained visual representations produced by either supervised or self-supervised methods in two steps.
Via extensive experiments on a wide range of classification tasks, our study reveals the generality and flexibility of k-NN integration.
arXiv Detail & Related papers (2021-12-15T20:15:01Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - CNN vs ELM for Image-Based Malware Classification [3.4806267677524896]
We train and evaluate machine learning models for malware classification, based on features that can be obtained without disassembly or execution of code.
We find that ELMs can achieve accuracies on par with CNNs, yet ELM training requires less than2% of the time needed to train a comparable CNN.
arXiv Detail & Related papers (2021-03-24T00:51:06Z) - Malware Classification with Word Embedding Features [6.961253535504979]
Modern malware classification techniques rely on machine learning models that can be trained on features such as opcode sequences.
We implement hybrid machine learning techniques, where we engineer feature vectors by training hidden Markov models.
We conduct substantial experiments over a variety of malware families.
arXiv Detail & Related papers (2021-03-03T21:57:11Z) - Improved Code Summarization via a Graph Neural Network [96.03715569092523]
In general, source code summarization techniques use the source code as input and outputs a natural language description.
We present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries.
arXiv Detail & Related papers (2020-04-06T17:36:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.