Related papers: Cross-Inlining Binary Function Similarity Detection

Cross-Inlining Binary Function Similarity Detection

URL: http://arxiv.org/abs/2401.05739v1
Date: Thu, 11 Jan 2024 08:42:08 GMT
Title: Cross-Inlining Binary Function Similarity Detection
Authors: Ang Jia, Ming Fan, Xi Xu, Wuxia Jin, Haijun Wang, Ting Liu
Abstract summary: We propose a pattern-based model named CI-Detector for cross-inlining matching. Results show that CI-Detector can detect cross-inlining pairs with a precision of 81% and a recall of 97%, which exceeds all state-of-the-art works.
Score: 16.923959153965857
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Binary function similarity detection plays an important role in a wide range of security applications. Existing works usually assume that the query function and target function share equal semantics and compare their full semantics to obtain the similarity. However, we find that the function mapping is more complex, especially when function inlining happens. In this paper, we will systematically investigate cross-inlining binary function similarity detection. We first construct a cross-inlining dataset by compiling 51 projects using 9 compilers, with 4 optimizations, to 6 architectures, with 2 inlining flags, which results in two datasets both with 216 combinations. Then we construct the cross-inlining function mappings by linking the common source functions in these two datasets. Through analysis of this dataset, we find that three cross-inlining patterns widely exist while existing work suffers when detecting cross-inlining binary function similarity. Next, we propose a pattern-based model named CI-Detector for cross-inlining matching. CI-Detector uses the attributed CFG to represent the semantics of binary functions and GNN to embed binary functions into vectors. CI-Detector respectively trains a model for these three cross-inlining patterns. Finally, the testing pairs are input to these three models and all the produced similarities are aggregated to produce the final similarity. We conduct several experiments to evaluate CI-Detector. Results show that CI-Detector can detect cross-inlining pairs with a precision of 81% and a recall of 97%, which exceeds all state-of-the-art works.

Related papers

BinCoFer: Three-Stage Purification for Effective C/C++ Binary Third-Party Library Detection [3.406168883492101]
Third-party libraries (TPL) are becoming increasingly popular to achieve efficient and concise software development. unregulated use of TPL will introduce legal and security issues in software development. BinCoFer is a tool designed for detecting TPLs reused in binary programs.
arXiv Detail & Related papers (2025-04-28T07:57:42Z)
Beyond the Edge of Function: Unraveling the Patterns of Type Recovery in Binary Code [55.493408628371235]
We propose ByteTR, a framework for recovering variable types in binary code. In light of the ubiquity of variable propagation across functions, ByteTR conducts inter-procedural analysis to trace variable propagation and employs a gated graph neural network to capture long-range data flow dependencies for variable type recovery.
arXiv Detail & Related papers (2025-03-10T12:27:05Z)
Is Function Similarity Over-Engineered? Building a Benchmark [37.33020176141435]
We build a new benchmark for binary function similarity detection consisting of high-quality datasets and tests that better reflect real-world use cases. Our benchmark reveals that a new, simple basline, one which looks at only the raw bytes of a function, and requires no disassembly or other pre-processing, is able to achieve state-of-the-art performance in multiple settings.
arXiv Detail & Related papers (2024-10-30T03:59:46Z)
Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification. In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction. Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z)
GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning [51.677086019209554]
We propose a Generalized Structural Sparse to capture powerful relationships across modalities for pair-wise similarity learning. The distance metric delicately encapsulates two formats of diagonal and block-diagonal terms. Experiments on cross-modal and two extra uni-modal retrieval tasks have validated its superiority and flexibility.
arXiv Detail & Related papers (2024-10-20T03:45:50Z)
The object detection model uses combined extraction with KNN and RF classification [0.0]
This study contributes to the field of object detection with a new approach combining GLCM and LBP as feature vectors as well as VE for classification. System testing used a dataset of 4,437 2D images, the results for KNN accuracy were 92.7% and F1-score 92.5%, while RF performance was lower.
arXiv Detail & Related papers (2024-05-09T05:21:42Z)
FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs [54.27040631527217]
We propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries. We first build a binary large language model (FoC-BinLLM) to summarize the semantics of cryptographic functions in natural language. We then build a binary code similarity model (FoC-Sim) upon the FoC-BinLLM to create change-sensitive representations and use it to retrieve similar implementations of unknown cryptographic functions in a database.
arXiv Detail & Related papers (2024-03-27T09:45:33Z)
FASER: Binary Code Similarity Search through the use of Intermediate Representations [0.8594140167290099]
Cross-Architecture Binary Code Similarity Search has been explored in numerous studies. We propose Function as a String Encoded Representation (FASER) to create a model capable of cross architecture function search.
arXiv Detail & Related papers (2023-10-05T15:36:35Z)
SCVCNet: Sliding cross-vector convolution network for cross-task and inter-individual-set EEG-based cognitive workload recognition [15.537230343119875]
This paper presents a generic approach for applying the cognitive workload recognizer by exploiting common electroencephalogram (EEG) patterns across different human-machine tasks and individual sets. We propose a neural network called SCVCNet, which eliminates task- and individual-set-related interferences in EEGs by analyzing finer-grained frequency structures in the power spectral densities.
arXiv Detail & Related papers (2023-09-21T13:06:30Z)
3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds. Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z)
UniASM: Binary Code Similarity Detection without Fine-tuning [0.8271859911016718]
We propose a novel transformer-based binary code embedding model named UniASM to learn representations of the binary functions. In the real-world task of known vulnerability search, UniASM outperforms all the current baselines.
arXiv Detail & Related papers (2022-10-28T14:04:57Z)
Learning Implicit Feature Alignment Function for Semantic Segmentation [51.36809814890326]
Implicit Feature Alignment function (IFA) is inspired by the rapidly expanding topic of implicit neural representations. We show that IFA implicitly aligns the feature maps at different levels and is capable of producing segmentation maps in arbitrary resolutions. Our method can be combined with improvement on various architectures, and it achieves state-of-the-art accuracy trade-off on common benchmarks.
arXiv Detail & Related papers (2022-06-17T09:40:14Z)
Deep ensembles based on Stochastic Activation Selection for Polyp Segmentation [82.61182037130406]
This work deals with medical image segmentation and in particular with accurate polyp detection and segmentation during colonoscopy examinations. Basic architecture in image segmentation consists of an encoder and a decoder. We compare some variant of the DeepLab architecture obtained by varying the decoder backbone.
arXiv Detail & Related papers (2021-04-02T02:07:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.