Malware Analysis with Symbolic Execution and Graph Kernel
- URL: http://arxiv.org/abs/2204.05632v1
- Date: Tue, 12 Apr 2022 08:52:33 GMT
- Title: Malware Analysis with Symbolic Execution and Graph Kernel
- Authors: Charles-Henry Bertrand Van Ouytsel and Axel Legay
- Abstract summary: We propose a new efficient open source toolchain for machine learning-based classification.
We focus on the 1-dimensional Weisfeiler-Lehman kernel, which can capture local similarities between graphs.
- Score: 2.1377923666134113
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Malware analysis techniques are divided into static and dynamic analysis.
Both techniques can be bypassed by circumvention techniques such as
obfuscation. In a series of works, the authors have promoted the use of
symbolic executions combined with machine learning to avoid such traps. Most of
those works rely on natural graph-based representations that can then be
plugged into graph-based learning algorithms such as Gspan. There are two main
problems with this approach. The first one is in the cost of computing the
graph. Indeed, working with graphs requires one to compute and representing the
entire state-space of the file under analysis. As such computation is too
cumbersome, the techniques often rely on developing strategies to compute a
representative subgraph of the behaviors. Unfortunately, efficient
graph-building strategies remain weakly explored. The second problem is in the
classification itself. Graph-based machine learning algorithms rely on
comparing the biggest common structures. This sidelines small but specific
parts of the malware signature. In addition, it does not allow us to work with
efficient algorithms such as support vector machine. We propose a new efficient
open source toolchain for machine learning-based classification. We also
explore how graph-kernel techniques can be used in the process. We focus on the
1-dimensional Weisfeiler-Lehman kernel, which can capture local similarities
between graphs. Our experimental results show that our approach outperforms
existing ones by an impressive factor.
Related papers
- Bures-Wasserstein Means of Graphs [60.42414991820453]
We propose a novel framework for defining a graph mean via embeddings in the space of smooth graph signal distributions.
By finding a mean in this embedding space, we can recover a mean graph that preserves structural information.
We establish the existence and uniqueness of the novel graph mean, and provide an iterative algorithm for computing it.
arXiv Detail & Related papers (2023-05-31T11:04:53Z) - State of the Art and Potentialities of Graph-level Learning [54.68482109186052]
Graph-level learning has been applied to many tasks including comparison, regression, classification, and more.
Traditional approaches to learning a set of graphs rely on hand-crafted features, such as substructures.
Deep learning has helped graph-level learning adapt to the growing scale of graphs by extracting features automatically and encoding graphs into low-dimensional representations.
arXiv Detail & Related papers (2023-01-14T09:15:49Z) - Learning node embeddings via summary graphs: a brief theoretical
analysis [55.25628709267215]
Graph representation learning plays an important role in many graph mining applications, but learning embeddings of large-scale graphs remains a problem.
Recent works try to improve scalability via graph summarization -- i.e., they learn embeddings on a smaller summary graph, and then restore the node embeddings of the original graph.
We give an in-depth theoretical analysis of three specific embedding learning methods based on introduced kernel matrix.
arXiv Detail & Related papers (2022-07-04T04:09:50Z) - Benchmarking Node Outlier Detection on Graphs [90.29966986023403]
Graph outlier detection is an emerging but crucial machine learning task with numerous applications.
We present the first comprehensive unsupervised node outlier detection benchmark for graphs called UNOD.
arXiv Detail & Related papers (2022-06-21T01:46:38Z) - Computing Graph Descriptors on Edge Streams [4.129847064263056]
We present streaming algorithms to approximately compute three different graph descriptors.
operating on edge streams allows us to avoid storing the entire graph in memory.
Our scalable algorithms compute descriptors of graphs with millions of edges within minutes.
arXiv Detail & Related papers (2021-09-02T06:21:47Z) - Partition and Code: learning how to compress graphs [50.29024357495154]
"Partition and Code" framework entails three steps: first, a partitioning algorithm decomposes the graph into elementary structures, then these are mapped to the elements of a small dictionary on which we learn a probability distribution, and finally, an entropy encoder translates the representation into bits.
Our algorithms are quantitatively evaluated on diverse real-world networks obtaining significant performance improvements with respect to different families of non-parametric and parametric graph compressor.
arXiv Detail & Related papers (2021-07-05T11:41:16Z) - Graph coarsening: From scientific computing to machine learning [11.728753892489776]
The goal of this paper is to take a broad look into coarsening techniques that have been successfully deployed in scientific computing.
In machine learning, graph coarsening goes under various names, e.g., graph downsampling or graph reduction.
As will be seen, a common strategy in these methods is to rely on spectral properties to define the coarse graph.
arXiv Detail & Related papers (2021-06-22T15:31:50Z) - Learnable Graph Matching: Incorporating Graph Partitioning with Deep
Feature Learning for Multiple Object Tracking [58.30147362745852]
Data association across frames is at the core of Multiple Object Tracking (MOT) task.
Existing methods mostly ignore the context information among tracklets and intra-frame detections.
We propose a novel learnable graph matching method to address these issues.
arXiv Detail & Related papers (2021-03-30T08:58:45Z) - Understanding Coarsening for Embedding Large-Scale Graphs [3.6739949215165164]
Proper analysis of graphs with Machine Learning (ML) algorithms has the potential to yield far-reaching insights into many areas of research and industry.
The irregular structure of graph data constitutes an obstacle for running ML tasks on graphs.
We analyze the impact of the coarsening quality on the embedding performance both in terms of speed and accuracy.
arXiv Detail & Related papers (2020-09-10T15:06:33Z) - About Graph Degeneracy, Representation Learning and Scalability [2.029783382155471]
We present two techniques taking advantage of the K-Core Decomposition to reduce the time and memory consumption of walk-based Graph Representation Learning algorithms.
We evaluate the performances, expressed in terms of quality of embedding and computational resources, of the proposed techniques on several academic datasets.
arXiv Detail & Related papers (2020-09-04T09:39:43Z) - Graph topology inference benchmarks for machine learning [16.857405938139525]
We introduce several benchmarks specifically designed to reveal the relative merits and limitations of graph inference methods.
We also contrast some of the most prominent techniques in the literature.
arXiv Detail & Related papers (2020-07-16T09:40:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.