Cross-Inlining Binary Function Similarity Detection
- URL: http://arxiv.org/abs/2401.05739v1
- Date: Thu, 11 Jan 2024 08:42:08 GMT
- Title: Cross-Inlining Binary Function Similarity Detection
- Authors: Ang Jia, Ming Fan, Xi Xu, Wuxia Jin, Haijun Wang, Ting Liu
- Abstract summary: We propose a pattern-based model named CI-Detector for cross-inlining matching.
Results show that CI-Detector can detect cross-inlining pairs with a precision of 81% and a recall of 97%, which exceeds all state-of-the-art works.
- Score: 16.923959153965857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Binary function similarity detection plays an important role in a wide range
of security applications. Existing works usually assume that the query function
and target function share equal semantics and compare their full semantics to
obtain the similarity. However, we find that the function mapping is more
complex, especially when function inlining happens.
In this paper, we will systematically investigate cross-inlining binary
function similarity detection. We first construct a cross-inlining dataset by
compiling 51 projects using 9 compilers, with 4 optimizations, to 6
architectures, with 2 inlining flags, which results in two datasets both with
216 combinations. Then we construct the cross-inlining function mappings by
linking the common source functions in these two datasets. Through analysis of
this dataset, we find that three cross-inlining patterns widely exist while
existing work suffers when detecting cross-inlining binary function similarity.
Next, we propose a pattern-based model named CI-Detector for cross-inlining
matching. CI-Detector uses the attributed CFG to represent the semantics of
binary functions and GNN to embed binary functions into vectors. CI-Detector
respectively trains a model for these three cross-inlining patterns. Finally,
the testing pairs are input to these three models and all the produced
similarities are aggregated to produce the final similarity. We conduct several
experiments to evaluate CI-Detector. Results show that CI-Detector can detect
cross-inlining pairs with a precision of 81% and a recall of 97%, which exceeds
all state-of-the-art works.
Related papers
- Is Function Similarity Over-Engineered? Building a Benchmark [37.33020176141435]
We build a new benchmark for binary function similarity detection consisting of high-quality datasets and tests that better reflect real-world use cases.
Our benchmark reveals that a new, simple basline, one which looks at only the raw bytes of a function, and requires no disassembly or other pre-processing, is able to achieve state-of-the-art performance in multiple settings.
arXiv Detail & Related papers (2024-10-30T03:59:46Z) - Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification.
In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction.
Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z) - GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning [51.677086019209554]
We propose a Generalized Structural Sparse to capture powerful relationships across modalities for pair-wise similarity learning.
The distance metric delicately encapsulates two formats of diagonal and block-diagonal terms.
Experiments on cross-modal and two extra uni-modal retrieval tasks have validated its superiority and flexibility.
arXiv Detail & Related papers (2024-10-20T03:45:50Z) - The object detection model uses combined extraction with KNN and RF classification [0.0]
This study contributes to the field of object detection with a new approach combining GLCM and LBP as feature vectors as well as VE for classification.
System testing used a dataset of 4,437 2D images, the results for KNN accuracy were 92.7% and F1-score 92.5%, while RF performance was lower.
arXiv Detail & Related papers (2024-05-09T05:21:42Z) - FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs [54.27040631527217]
We propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries.
FoC-BinLLM outperforms ChatGPT by 14.61% on the ROUGE-L score.
FoC-Sim outperforms the previous best methods with a 52% higher Recall@1.
arXiv Detail & Related papers (2024-03-27T09:45:33Z) - FASER: Binary Code Similarity Search through the use of Intermediate
Representations [0.8594140167290099]
Cross-Architecture Binary Code Similarity Search has been explored in numerous studies.
We propose Function as a String Encoded Representation (FASER) to create a model capable of cross architecture function search.
arXiv Detail & Related papers (2023-10-05T15:36:35Z) - SCVCNet: Sliding cross-vector convolution network for cross-task and
inter-individual-set EEG-based cognitive workload recognition [15.537230343119875]
This paper presents a generic approach for applying the cognitive workload recognizer by exploiting common electroencephalogram (EEG) patterns across different human-machine tasks and individual sets.
We propose a neural network called SCVCNet, which eliminates task- and individual-set-related interferences in EEGs by analyzing finer-grained frequency structures in the power spectral densities.
arXiv Detail & Related papers (2023-09-21T13:06:30Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - UniASM: Binary Code Similarity Detection without Fine-tuning [0.8271859911016718]
We propose a novel transformer-based binary code embedding model named UniASM to learn representations of the binary functions.
In the real-world task of known vulnerability search, UniASM outperforms all the current baselines.
arXiv Detail & Related papers (2022-10-28T14:04:57Z) - Learning Implicit Feature Alignment Function for Semantic Segmentation [51.36809814890326]
Implicit Feature Alignment function (IFA) is inspired by the rapidly expanding topic of implicit neural representations.
We show that IFA implicitly aligns the feature maps at different levels and is capable of producing segmentation maps in arbitrary resolutions.
Our method can be combined with improvement on various architectures, and it achieves state-of-the-art accuracy trade-off on common benchmarks.
arXiv Detail & Related papers (2022-06-17T09:40:14Z) - Deep ensembles based on Stochastic Activation Selection for Polyp
Segmentation [82.61182037130406]
This work deals with medical image segmentation and in particular with accurate polyp detection and segmentation during colonoscopy examinations.
Basic architecture in image segmentation consists of an encoder and a decoder.
We compare some variant of the DeepLab architecture obtained by varying the decoder backbone.
arXiv Detail & Related papers (2021-04-02T02:07:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.