iCallee: Recovering Call Graphs for Binaries
- URL: http://arxiv.org/abs/2111.01415v2
- Date: Wed, 3 Nov 2021 02:57:17 GMT
- Title: iCallee: Recovering Call Graphs for Binaries
- Authors: Wenyu Zhu, Zhiyao Feng, Zihan Zhang, Zhijian Ou, Min Yang, Chao Zhang
- Abstract summary: Existing indirect callee recognition solutions for binaries all have high false positives and negatives, making call graphs inaccurate.
We propose a new solution iCallee based on the Siamese Neural Network, inspired by the advances in question-answering applications.
We have implemented a prototype of iCallee and evaluated it on several groups of targets.
- Score: 31.73821825871851
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recovering programs' call graphs is crucial for inter-procedural analysis
tasks and applications based on them. The core challenge is recognizing targets
of indirect calls (i.e., indirect callees). It becomes more challenging if
target programs are in binary forms, due to information loss in binaries.
Existing indirect callee recognition solutions for binaries all have high false
positives and negatives, making call graphs inaccurate.
In this paper, we propose a new solution iCallee based on the Siamese Neural
Network, inspired by the advances in question-answering applications. The key
insight is that, neural networks can learn to answer whether a callee function
is a potential target of an indirect callsite by comprehending their contexts,
i.e., instructions nearby callsites and of callees. Following this insight, we
first preprocess target binaries to extract contexts of callsites and callees.
Then, we build a customized Natural Language Processing (NLP) model applicable
to assembly language. Further, we collect abundant pairs of callsites and
callees, and embed their contexts with the NLP model, then train a Siamese
network and a classifier to answer the callsite-callee question. We have
implemented a prototype of iCallee and evaluated it on several groups of
targets. Evaluation results showed that, our solution could match callsites to
callees with an F1-Measure of 93.7%, recall of 93.8%, and precision of 93.5%,
much better than state-of-the-art solutions. To show its usefulness, we apply
iCallee to two specific applications - binary code similarity detection and
binary program hardening, and found that it could greatly improve
state-of-the-art solutions.
Related papers
- Resolving Indirect Calls in Binary Code via Cross-Reference Augmented Graph Neural Networks [13.11143749397866]
NeuCall is a novel approach for resolving indirect calls using graph neural networks.<n>We leverage advanced compiler-level type analysis to generate high-quality callsite-callee training pairs.<n>NeuCall achieves an F1 score of 95.2%, outperforming state-of-the-art ML-based approaches.
arXiv Detail & Related papers (2025-07-24T20:54:41Z) - Call Me Maybe: Enhancing JavaScript Call Graph Construction using Graph Neural Networks [15.40199816880172]
Previous work shows that even advanced solutions produce false edges and miss valid ones.<n>Our main idea is to frame the problem as link prediction on full program graphs, using a rich representation with multiple edge types.<n>Our results show that learning-based methods can improve the recall of JavaScript call graph construction.
arXiv Detail & Related papers (2025-06-22T22:26:44Z) - Multi-turn Response Selection with Commonsense-enhanced Language Models [32.921901489497714]
We design a Siamese network where a pre-trained Language model merges with a Graph neural network (SinLG)
SinLG takes advantage of Pre-trained Language Models (PLMs) to catch the word correlations in the context and response candidates.
The GNN aims to assist the PLM in fine-tuning, and arousing its related memories to attain better performance.
arXiv Detail & Related papers (2024-07-26T03:13:47Z) - Know Your Neighborhood: General and Zero-Shot Capable Binary Function Search Powered by Call Graphlets [0.7646713951724013]
This paper proposes a novel graph neural network architecture combined with a novel graph data representation called call graphlets.
A specialized graph neural network model operates on this graph representation, learning to map it to a feature vector that encodes semantic binary code similarities.
Experimental results show that the combination of call graphlets and the novel graph neural network architecture achieves comparable or state-of-the-art performance.
arXiv Detail & Related papers (2024-06-02T18:26:50Z) - Can Graph Learning Improve Planning in LLM-based Agents? [61.47027387839096]
Task planning in language agents is emerging as an important research topic alongside the development of large language models (LLMs)
In this paper, we explore graph learning-based methods for task planning, a direction that is to the prevalent focus on prompt design.
Our interest in graph learning stems from a theoretical discovery: the biases of attention and auto-regressive loss impede LLMs' ability to effectively navigate decision-making on graphs.
arXiv Detail & Related papers (2024-05-29T14:26:24Z) - Neural Retriever and Go Beyond: A Thesis Proposal [1.082365064737981]
Information Retriever (IR) aims to find the relevant documents to a given query at large scale.
Recent neural-based algorithms (termed as neural retrievers) have gained more attention which can mitigate the limitations of traditional methods.
arXiv Detail & Related papers (2022-05-31T17:59:30Z) - Improved Aggregating and Accelerating Training Methods for Spatial Graph
Neural Networks on Fraud Detection [0.0]
This work proposes an improved deep architecture to extend CAmouflage-REsistant GNN (CARE-GNN) to deep models named as Residual Layered CARE-GNN (RLC-GNN)
Three issues of RLC-GNN are the usage of neighboring information reaching limitation, the training difficulty and lack of comprehensive consideration about node features and external patterns.
Experiments are conducted on Yelp and Amazon datasets.
arXiv Detail & Related papers (2022-02-14T09:51:35Z) - Very Deep Graph Neural Networks Via Noise Regularisation [57.450532911995516]
Graph Neural Networks (GNNs) perform learned message passing over an input graph.
We train a deep GNN with up to 100 message passing steps and achieve several state-of-the-art results.
arXiv Detail & Related papers (2021-06-15T08:50:10Z) - Learnable Graph Matching: Incorporating Graph Partitioning with Deep
Feature Learning for Multiple Object Tracking [58.30147362745852]
Data association across frames is at the core of Multiple Object Tracking (MOT) task.
Existing methods mostly ignore the context information among tracklets and intra-frame detections.
We propose a novel learnable graph matching method to address these issues.
arXiv Detail & Related papers (2021-03-30T08:58:45Z) - Question Answering over Knowledge Bases by Leveraging Semantic Parsing
and Neuro-Symbolic Reasoning [73.00049753292316]
We propose a semantic parsing and reasoning-based Neuro-Symbolic Question Answering(NSQA) system.
NSQA achieves state-of-the-art performance on QALD-9 and LC-QuAD 1.0.
arXiv Detail & Related papers (2020-12-03T05:17:55Z) - Classifying Malware Using Function Representations in a Static Call
Graph [0.0]
We propose a deep learning approach for identifying malware families using the function call graphs of x86 assembly instructions.
We test our approach by performing several experiments on a Microsoft malware classification data set and achieve excellent separation between malware families with a classification accuracy of 99.41%.
arXiv Detail & Related papers (2020-12-01T20:36:19Z) - Global Optimization of Objective Functions Represented by ReLU Networks [77.55969359556032]
Neural networks can learn complex, non- adversarial functions, and it is challenging to guarantee their correct behavior in safety-critical contexts.
Many approaches exist to find failures in networks (e.g., adversarial examples), but these cannot guarantee the absence of failures.
We propose an approach that integrates the optimization process into the verification procedure, achieving better performance than the naive approach.
arXiv Detail & Related papers (2020-10-07T08:19:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.