Classifying Malware Using Function Representations in a Static Call
Graph
- URL: http://arxiv.org/abs/2012.01939v1
- Date: Tue, 1 Dec 2020 20:36:19 GMT
- Title: Classifying Malware Using Function Representations in a Static Call
Graph
- Authors: Thomas Dalton, Mauritius Schmidtler, Alireza Hadj Khodabakhshi
- Abstract summary: We propose a deep learning approach for identifying malware families using the function call graphs of x86 assembly instructions.
We test our approach by performing several experiments on a Microsoft malware classification data set and achieve excellent separation between malware families with a classification accuracy of 99.41%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a deep learning approach for identifying malware families using
the function call graphs of x86 assembly instructions. Though prior work on
static call graph analysis exists, very little involves the application of
modern, principled feature learning techniques to the problem. In this paper,
we introduce a system utilizing an executable's function call graph where
function representations are obtained by way of a recurrent neural network
(RNN) autoencoder which maps sequences of x86 instructions into dense, latent
vectors. These function embeddings are then modeled as vertices in a graph with
edges indicating call dependencies. Capturing rich, node-level representations
as well as global, topological properties of an executable file greatly
improves malware family detection rates and contributes to a more principled
approach to the problem in a way that deliberately avoids tedious feature
engineering and domain expertise. We test our approach by performing several
experiments on a Microsoft malware classification data set and achieve
excellent separation between malware families with a classification accuracy of
99.41%.
Related papers
- Know Your Neighborhood: General and Zero-Shot Capable Binary Function Search Powered by Call Graphlets [0.7646713951724013]
This paper proposes a novel graph neural network architecture combined with a novel graph data representation called call graphlets.
A specialized graph neural network model operates on this graph representation, learning to map it to a feature vector that encodes semantic binary code similarities.
Experimental results show that the combination of call graphlets and the novel graph neural network architecture achieves comparable or state-of-the-art performance.
arXiv Detail & Related papers (2024-06-02T18:26:50Z) - Bures-Wasserstein Means of Graphs [60.42414991820453]
We propose a novel framework for defining a graph mean via embeddings in the space of smooth graph signal distributions.
By finding a mean in this embedding space, we can recover a mean graph that preserves structural information.
We establish the existence and uniqueness of the novel graph mean, and provide an iterative algorithm for computing it.
arXiv Detail & Related papers (2023-05-31T11:04:53Z) - GIF: A General Graph Unlearning Strategy via Influence Function [63.52038638220563]
Graph Influence Function (GIF) is a model-agnostic unlearning method that can efficiently and accurately estimate parameter changes in response to a $epsilon$-mass perturbation in deleted data.
We conduct extensive experiments on four representative GNN models and three benchmark datasets to justify GIF's superiority in terms of unlearning efficacy, model utility, and unlearning efficiency.
arXiv Detail & Related papers (2023-04-06T03:02:54Z) - A Comparison of Graph Neural Networks for Malware Classification [2.707154152696381]
We train a wide range of Graph Neural Network (GNN) architectures to generate embeddings which we then classify.
We find that our best GNN models outperform previous comparable research involving the well-known MalNet-Tiny Android malware dataset.
arXiv Detail & Related papers (2023-03-22T01:05:57Z) - State of the Art and Potentialities of Graph-level Learning [54.68482109186052]
Graph-level learning has been applied to many tasks including comparison, regression, classification, and more.
Traditional approaches to learning a set of graphs rely on hand-crafted features, such as substructures.
Deep learning has helped graph-level learning adapt to the growing scale of graphs by extracting features automatically and encoding graphs into low-dimensional representations.
arXiv Detail & Related papers (2023-01-14T09:15:49Z) - Learning Heuristics for the Maximum Clique Enumeration Problem Using Low
Dimensional Representations [0.0]
We use a learning framework for a pruning process of the input graph towards reducing the clique of the maximum enumeration problem.
We study the role of using different vertex representations on the performance of this runtime method.
We observe that using local graph features in the classification process produce more accurate results when combined with a feature elimination process.
arXiv Detail & Related papers (2022-10-30T22:04:32Z) - Malware Analysis with Symbolic Execution and Graph Kernel [2.1377923666134113]
We propose a new efficient open source toolchain for machine learning-based classification.
We focus on the 1-dimensional Weisfeiler-Lehman kernel, which can capture local similarities between graphs.
arXiv Detail & Related papers (2022-04-12T08:52:33Z) - Joint Graph Learning and Matching for Semantic Feature Correspondence [69.71998282148762]
We propose a joint emphgraph learning and matching network, named GLAM, to explore reliable graph structures for boosting graph matching.
The proposed method is evaluated on three popular visual matching benchmarks (Pascal VOC, Willow Object and SPair-71k)
It outperforms previous state-of-the-art graph matching methods by significant margins on all benchmarks.
arXiv Detail & Related papers (2021-09-01T08:24:02Z) - Temporal Graph Network Embedding with Causal Anonymous Walks
Representations [54.05212871508062]
We propose a novel approach for dynamic network representation learning based on Temporal Graph Network.
For evaluation, we provide a benchmark pipeline for the evaluation of temporal network embeddings.
We show the applicability and superior performance of our model in the real-world downstream graph machine learning task provided by one of the top European banks.
arXiv Detail & Related papers (2021-08-19T15:39:52Z) - Time-varying Graph Representation Learning via Higher-Order Skip-Gram
with Negative Sampling [0.456877715768796]
We build upon the fact that the skip-gram embedding approach implicitly performs a matrix factorization.
We show that higher-order skip-gram with negative sampling is able to disentangle the role of nodes and time.
We empirically evaluate our approach using time-resolved face-to-face proximity data, showing that the learned time-varying graph representations outperform state-of-the-art methods.
arXiv Detail & Related papers (2020-06-25T12:04:48Z) - Structural Temporal Graph Neural Networks for Anomaly Detection in
Dynamic Graphs [54.13919050090926]
We propose an end-to-end structural temporal Graph Neural Network model for detecting anomalous edges in dynamic graphs.
In particular, we first extract the $h$-hop enclosing subgraph centered on the target edge and propose the node labeling function to identify the role of each node in the subgraph.
Based on the extracted features, we utilize Gated recurrent units (GRUs) to capture the temporal information for anomaly detection.
arXiv Detail & Related papers (2020-05-15T09:17:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.