Know Your Neighborhood: General and Zero-Shot Capable Binary Function Search Powered by Call Graphlets
- URL: http://arxiv.org/abs/2406.02606v1
- Date: Sun, 2 Jun 2024 18:26:50 GMT
- Title: Know Your Neighborhood: General and Zero-Shot Capable Binary Function Search Powered by Call Graphlets
- Authors: Joshua Collyer, Tim Watson, Iain Phillips,
- Abstract summary: This paper proposes a novel graph neural network architecture combined with a novel graph data representation called call graphlets.
A specialized graph neural network model is then designed to operate on this graph representation, learning to map it to a feature vector that encodes semantic code similarities.
Experimental results demonstrate that the combination of call graphlets and the novel graph neural network architecture achieves state-of-the-art performance.
- Score: 0.7646713951724013
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Binary code similarity detection is an important problem with applications in areas like malware analysis, vulnerability research and plagiarism detection. This paper proposes a novel graph neural network architecture combined with a novel graph data representation called call graphlets. A call graphlet encodes the neighborhood around each function in a binary executable, capturing the local and global context through a series of statistical features. A specialized graph neural network model is then designed to operate on this graph representation, learning to map it to a feature vector that encodes semantic code similarities using deep metric learning. The proposed approach is evaluated across four distinct datasets covering different architectures, compiler toolchains, and optimization levels. Experimental results demonstrate that the combination of call graphlets and the novel graph neural network architecture achieves state-of-the-art performance compared to baseline techniques across cross-architecture, mono-architecture and zero shot tasks. In addition, our proposed approach also performs well when evaluated against an out-of-domain function inlining task. Overall, the work provides a general and effective graph neural network-based solution for conducting binary code similarity detection.
Related papers
- Graphcode: Learning from multiparameter persistent homology using graph neural networks [0.06138671548064355]
Graphcodes handle datasets that are filtered along two real-valued scale parameters.
Graphcodes yield an informative and interpretable summary.
They can be readily integrated in machine learning pipelines using graph neural networks.
arXiv Detail & Related papers (2024-05-23T08:22:00Z) - Unsupervised Graph Neural Architecture Search with Disentangled
Self-supervision [51.88848982611515]
Unsupervised graph neural architecture search remains unexplored in the literature.
We propose a novel Disentangled Self-supervised Graph Neural Architecture Search model.
Our model is able to achieve state-of-the-art performance against several baseline methods in an unsupervised manner.
arXiv Detail & Related papers (2024-03-08T05:23:55Z) - GNN-LoFI: a Novel Graph Neural Network through Localized Feature-based
Histogram Intersection [51.608147732998994]
Graph neural networks are increasingly becoming the framework of choice for graph-based machine learning.
We propose a new graph neural network architecture that substitutes classical message passing with an analysis of the local distribution of node features.
arXiv Detail & Related papers (2024-01-17T13:04:23Z) - UniG-Encoder: A Universal Feature Encoder for Graph and Hypergraph Node
Classification [6.977634174845066]
A universal feature encoder for both graph and hypergraph representation learning is designed, called UniG-Encoder.
The architecture starts with a forward transformation of the topological relationships of connected nodes into edge or hyperedge features.
The encoded node embeddings are then derived from the reversed transformation, described by the transpose of the projection matrix.
arXiv Detail & Related papers (2023-08-03T09:32:50Z) - Bures-Wasserstein Means of Graphs [60.42414991820453]
We propose a novel framework for defining a graph mean via embeddings in the space of smooth graph signal distributions.
By finding a mean in this embedding space, we can recover a mean graph that preserves structural information.
We establish the existence and uniqueness of the novel graph mean, and provide an iterative algorithm for computing it.
arXiv Detail & Related papers (2023-05-31T11:04:53Z) - Temporal Graph Network Embedding with Causal Anonymous Walks
Representations [54.05212871508062]
We propose a novel approach for dynamic network representation learning based on Temporal Graph Network.
For evaluation, we provide a benchmark pipeline for the evaluation of temporal network embeddings.
We show the applicability and superior performance of our model in the real-world downstream graph machine learning task provided by one of the top European banks.
arXiv Detail & Related papers (2021-08-19T15:39:52Z) - Co-embedding of Nodes and Edges with Graph Neural Networks [13.020745622327894]
Graph embedding is a way to transform and encode the data structure in high dimensional and non-Euclidean feature space.
CensNet is a general graph embedding framework, which embeds both nodes and edges to a latent feature space.
Our approach achieves or matches the state-of-the-art performance in four graph learning tasks.
arXiv Detail & Related papers (2020-10-25T22:39:31Z) - Graph Pooling with Node Proximity for Hierarchical Representation
Learning [80.62181998314547]
We propose a novel graph pooling strategy that leverages node proximity to improve the hierarchical representation learning of graph data with their multi-hop topology.
Results show that the proposed graph pooling strategy is able to achieve state-of-the-art performance on a collection of public graph classification benchmark datasets.
arXiv Detail & Related papers (2020-06-19T13:09:44Z) - GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training [62.73470368851127]
Graph representation learning has emerged as a powerful technique for addressing real-world problems.
We design Graph Contrastive Coding -- a self-supervised graph neural network pre-training framework.
We conduct experiments on three graph learning tasks and ten graph datasets.
arXiv Detail & Related papers (2020-06-17T16:18:35Z) - Deep Graph Matching Consensus [19.94426142777885]
This work presents a two-stage neural architecture for learning and refining structural correspondences between graphs.
First, we use localized node embeddings computed by a graph neural network to obtain an initial ranking of soft correspondences between nodes.
Secondly, we employ synchronous message passing networks to iteratively re-rank the soft correspondences to reach a matching consensus in local neighborhoods between graphs.
arXiv Detail & Related papers (2020-01-27T08:05:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.