Related papers: A Natural Language Processing Approach to Malware Classification

A Natural Language Processing Approach to Malware Classification

URL: http://arxiv.org/abs/2307.11032v1
Date: Fri, 7 Jul 2023 23:16:23 GMT
Title: A Natural Language Processing Approach to Malware Classification
Authors: Ritik Mehta and Olha Jure\v{c}kov\'a and Mark Stamp
Abstract summary: In this research, we consider a hybrid architecture, where Hidden Markov Models (HMM) are trained on opcode sequences. extracting the HMM hidden state sequences can be viewed as a form of feature engineering. We find that this NLP-based approach outperforms other popular techniques on a challenging malware dataset.
Score: 2.707154152696381
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many different machine learning and deep learning techniques have been successfully employed for malware detection and classification. Examples of popular learning techniques in the malware domain include Hidden Markov Models (HMM), Random Forests (RF), Convolutional Neural Networks (CNN), Support Vector Machines (SVM), and Recurrent Neural Networks (RNN) such as Long Short-Term Memory (LSTM) networks. In this research, we consider a hybrid architecture, where HMMs are trained on opcode sequences, and the resulting hidden states of these trained HMMs are used as feature vectors in various classifiers. In this context, extracting the HMM hidden state sequences can be viewed as a form of feature engineering that is somewhat analogous to techniques that are commonly employed in Natural Language Processing (NLP). We find that this NLP-based approach outperforms other popular techniques on a challenging malware dataset, with an HMM-Random Forrest model yielding the best results.

Related papers

OpCode-Based Malware Classification Using Machine Learning and Deep Learning Techniques [0.0]
This report presents a comprehensive analysis of malware classification using OpCode sequences. Two distinct approaches are evaluated: traditional machine learning using n-gram analysis with Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Decision Tree classifiers; and a deep learning approach employing a Convolutional Neural Network (CNN)
arXiv Detail & Related papers (2025-04-18T02:09:57Z)
Malware Classification using a Hybrid Hidden Markov Model-Convolutional Neural Network [1.2289361708127875]
We present a novel approach based on a hybrid architecture combining features extracted using a Hidden Markov Model (HMM) with a Convolutional Neural Network (CNN) We demonstrate the effectiveness of our approach on the popular Malicia dataset, and we obtain superior performance, as compared to other machine learning methods.
arXiv Detail & Related papers (2024-12-25T15:34:57Z)
Scalable Mechanistic Neural Networks for Differential Equations and Machine Learning [52.28945097811129]
We propose an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences. We reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear. Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources.
arXiv Detail & Related papers (2024-10-08T14:27:28Z)
Enhancing Malware Detection by Integrating Machine Learning with Cuckoo Sandbox [0.0]
This study aims to classify and identify malware extracted from a dataset containing API call sequences. Both deep learning and machine learning algorithms achieve remarkably high levels of accuracy, reaching up to 99% in certain cases.
arXiv Detail & Related papers (2023-11-07T22:33:17Z)
Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks. By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead. We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z)
How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series. We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Concurrent Neural Tree and Data Preprocessing AutoML for Image Classification [0.5735035463793008]
Current state-of-the-art (SOTA) methods do not include traditional methods for manipulating input data as part of the algorithmic search space. We adapt the Evolutionary Multi-objective Algorithm Design Engine (EMADE), a multi-objective evolutionary search framework for traditional machine learning methods, to perform neural architecture search. We show that including these methods as part of the search space shows potential to provide benefits to performance on the CIFAR-10 image classification benchmark dataset.
arXiv Detail & Related papers (2022-05-25T20:03:09Z)
Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM) Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
Malware Classification with Word Embedding Features [6.961253535504979]
Modern malware classification techniques rely on machine learning models that can be trained on features such as opcode sequences. We implement hybrid machine learning techniques, where we engineer feature vectors by training hidden Markov models. We conduct substantial experiments over a variety of malware families.
arXiv Detail & Related papers (2021-03-03T21:57:11Z)
Introducing the Hidden Neural Markov Chain framework [7.85426761612795]
This paper proposes the original Hidden Neural Markov Chain (HNMC) framework, a new family of sequential neural models. We propose three different models: the classic HNMC, the HNMC2, and the HNMC-CN. It shows this new neural sequential framework's potential, which can open the way to new models and might eventually compete with the prevalent BiLSTM and BiGRU.
arXiv Detail & Related papers (2021-02-17T20:13:45Z)
A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning [32.59760685342343]
Probabilistic Latent Variable Models provide an alternative to self-supervised learning approaches for linguistic representation learning from speech. In this work, we propose ConvDMM, a Gaussian state-space model with non-linear emission and transition functions modelled by deep neural networks. When trained on a large scale speech dataset (LibriSpeech), ConvDMM produces features that significantly outperform multiple self-supervised feature extracting methods.
arXiv Detail & Related papers (2020-06-03T21:50:20Z)
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding [97.85957811603251]
We present MT-DNN, an open-source natural language understanding (NLU) toolkit that makes it easy for researchers and developers to train customized deep learning models. Built upon PyTorch and Transformers, MT-DNN is designed to facilitate rapid customization for a broad spectrum of NLU tasks. A unique feature of MT-DNN is its built-in support for robust and transferable learning using the adversarial multi-task learning paradigm.
arXiv Detail & Related papers (2020-02-19T03:05:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.