Related papers: Malware Analysis with Symbolic Execution and Graph Kernel

Malware Analysis with Symbolic Execution and Graph Kernel

URL: http://arxiv.org/abs/2204.05632v1
Date: Tue, 12 Apr 2022 08:52:33 GMT
Title: Malware Analysis with Symbolic Execution and Graph Kernel
Authors: Charles-Henry Bertrand Van Ouytsel and Axel Legay
Abstract summary: We propose a new efficient open source toolchain for machine learning-based classification. We focus on the 1-dimensional Weisfeiler-Lehman kernel, which can capture local similarities between graphs.
Score: 2.1377923666134113
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Malware analysis techniques are divided into static and dynamic analysis. Both techniques can be bypassed by circumvention techniques such as obfuscation. In a series of works, the authors have promoted the use of symbolic executions combined with machine learning to avoid such traps. Most of those works rely on natural graph-based representations that can then be plugged into graph-based learning algorithms such as Gspan. There are two main problems with this approach. The first one is in the cost of computing the graph. Indeed, working with graphs requires one to compute and representing the entire state-space of the file under analysis. As such computation is too cumbersome, the techniques often rely on developing strategies to compute a representative subgraph of the behaviors. Unfortunately, efficient graph-building strategies remain weakly explored. The second problem is in the classification itself. Graph-based machine learning algorithms rely on comparing the biggest common structures. This sidelines small but specific parts of the malware signature. In addition, it does not allow us to work with efficient algorithms such as support vector machine. We propose a new efficient open source toolchain for machine learning-based classification. We also explore how graph-kernel techniques can be used in the process. We focus on the 1-dimensional Weisfeiler-Lehman kernel, which can capture local similarities between graphs. Our experimental results show that our approach outperforms existing ones by an impressive factor.

Related papers

Bures-Wasserstein Means of Graphs [60.42414991820453]
We propose a novel framework for defining a graph mean via embeddings in the space of smooth graph signal distributions. By finding a mean in this embedding space, we can recover a mean graph that preserves structural information. We establish the existence and uniqueness of the novel graph mean, and provide an iterative algorithm for computing it.
arXiv Detail & Related papers (2023-05-31T11:04:53Z)
State of the Art and Potentialities of Graph-level Learning [54.68482109186052]
Graph-level learning has been applied to many tasks including comparison, regression, classification, and more. Traditional approaches to learning a set of graphs rely on hand-crafted features, such as substructures. Deep learning has helped graph-level learning adapt to the growing scale of graphs by extracting features automatically and encoding graphs into low-dimensional representations.
arXiv Detail & Related papers (2023-01-14T09:15:49Z)
Learning node embeddings via summary graphs: a brief theoretical analysis [55.25628709267215]
Graph representation learning plays an important role in many graph mining applications, but learning embeddings of large-scale graphs remains a problem. Recent works try to improve scalability via graph summarization -- i.e., they learn embeddings on a smaller summary graph, and then restore the node embeddings of the original graph. We give an in-depth theoretical analysis of three specific embedding learning methods based on introduced kernel matrix.
arXiv Detail & Related papers (2022-07-04T04:09:50Z)
Benchmarking Node Outlier Detection on Graphs [90.29966986023403]
Graph outlier detection is an emerging but crucial machine learning task with numerous applications. We present the first comprehensive unsupervised node outlier detection benchmark for graphs called UNOD.
arXiv Detail & Related papers (2022-06-21T01:46:38Z)
Computing Graph Descriptors on Edge Streams [4.129847064263056]
We present streaming algorithms to approximately compute three different graph descriptors. operating on edge streams allows us to avoid storing the entire graph in memory. Our scalable algorithms compute descriptors of graphs with millions of edges within minutes.
arXiv Detail & Related papers (2021-09-02T06:21:47Z)
Partition and Code: learning how to compress graphs [50.29024357495154]
"Partition and Code" framework entails three steps: first, a partitioning algorithm decomposes the graph into elementary structures, then these are mapped to the elements of a small dictionary on which we learn a probability distribution, and finally, an entropy encoder translates the representation into bits. Our algorithms are quantitatively evaluated on diverse real-world networks obtaining significant performance improvements with respect to different families of non-parametric and parametric graph compressor.
arXiv Detail & Related papers (2021-07-05T11:41:16Z)
Graph coarsening: From scientific computing to machine learning [11.728753892489776]
The goal of this paper is to take a broad look into coarsening techniques that have been successfully deployed in scientific computing. In machine learning, graph coarsening goes under various names, e.g., graph downsampling or graph reduction. As will be seen, a common strategy in these methods is to rely on spectral properties to define the coarse graph.
arXiv Detail & Related papers (2021-06-22T15:31:50Z)
Learnable Graph Matching: Incorporating Graph Partitioning with Deep Feature Learning for Multiple Object Tracking [58.30147362745852]
Data association across frames is at the core of Multiple Object Tracking (MOT) task. Existing methods mostly ignore the context information among tracklets and intra-frame detections. We propose a novel learnable graph matching method to address these issues.
arXiv Detail & Related papers (2021-03-30T08:58:45Z)
Understanding Coarsening for Embedding Large-Scale Graphs [3.6739949215165164]
Proper analysis of graphs with Machine Learning (ML) algorithms has the potential to yield far-reaching insights into many areas of research and industry. The irregular structure of graph data constitutes an obstacle for running ML tasks on graphs. We analyze the impact of the coarsening quality on the embedding performance both in terms of speed and accuracy.
arXiv Detail & Related papers (2020-09-10T15:06:33Z)
About Graph Degeneracy, Representation Learning and Scalability [2.029783382155471]
We present two techniques taking advantage of the K-Core Decomposition to reduce the time and memory consumption of walk-based Graph Representation Learning algorithms. We evaluate the performances, expressed in terms of quality of embedding and computational resources, of the proposed techniques on several academic datasets.
arXiv Detail & Related papers (2020-09-04T09:39:43Z)
Graph topology inference benchmarks for machine learning [16.857405938139525]
We introduce several benchmarks specifically designed to reveal the relative merits and limitations of graph inference methods. We also contrast some of the most prominent techniques in the literature.
arXiv Detail & Related papers (2020-07-16T09:40:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.