Related papers: Fool Me If You Can: On the Robustness of Binary Code Similarity Detection Models against Semantics-preserving Transformations

Fool Me If You Can: On the Robustness of Binary Code Similarity Detection Models against Semantics-preserving Transformations

URL: http://arxiv.org/abs/2602.12681v1
Date: Fri, 13 Feb 2026 07:23:15 GMT
Title: Fool Me If You Can: On the Robustness of Binary Code Similarity Detection Models against Semantics-preserving Transformations
Authors: Jiyong Uhm, Minseok Kim, Michalis Polychronakis, Hyungjoon Koo,
Abstract summary: We evaluate the robustness of deep learning models for the task of binary code similarity detection.<n>We construct a dataset of 9,565 binary variants from 620 baseline samples.
Score: 7.222996408214315
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Binary code analysis plays an essential role in cybersecurity, facilitating reverse engineering to reveal the inner workings of programs in the absence of source code. Traditional approaches, such as static and dynamic analysis, extract valuable insights from stripped binaries, but often demand substantial expertise and manual effort. Recent advances in deep learning have opened promising opportunities to enhance binary analysis by capturing latent features and disclosing underlying code semantics. Despite the growing number of binary analysis models based on machine learning, their robustness to adversarial code transformations at the binary level remains underexplored. We evaluate the robustness of deep learning models for the task of binary code similarity detection (BCSD) under semantics-preserving transformations. The unique nature of machine instructions presents distinct challenges compared to the typical input perturbations found in other domains. We introduce asmFooler, a system that evaluates the resilience of BCSD models using a diverse set of adversarial code transformations that preserve functional semantics. We construct a dataset of 9,565 binary variants from 620 baseline samples by applying eight semantics-preserving transformations across six representative BCSD models. Our major findings highlight several key insights: i) model robustness relies on the processing pipeline, including code pre-processing, architecture, and feature selection; ii) adversarial transformation effectiveness is bounded by a budget shaped by model-specific constraints like input size and instruction expressive capacity; iii) well-crafted transformations can be highly effective with minimal perturbations; and iv) such transformations efficiently disrupt model decisions (e.g., misleading to false positives or false negatives) by focusing on semantically significant instructions.

Related papers

Readability-Robust Code Summarization via Meta Curriculum Learning [53.44612630063336]
In the real world, code is often poorly structured or obfuscated, significantly degrading model performance.<n>We propose RoFTCodeSum, a novel fine-tuning method that enhances the robustness of code summarization against poorly readable code.
arXiv Detail & Related papers (2026-01-09T02:38:24Z)
Cross-modal Retrieval Models for Stripped Binary Analysis [62.89251403093734]
BinSeek is the first two-stage cross-modal retrieval framework for stripped binary code analysis.<n>It consists of two models: BinSeekEmbedding is trained on large-scale dataset to learn the semantic relevance of the binary code.<n>BinSeek-Reranker learns to carefully judge the relevance of the candidate code to the description with context augmentation.
arXiv Detail & Related papers (2025-12-11T07:58:10Z)
Beyond the Edge of Function: Unraveling the Patterns of Type Recovery in Binary Code [55.493408628371235]
We propose ByteTR, a framework for recovering variable types in binary code.<n>In light of the ubiquity of variable propagation across functions, ByteTR conducts inter-procedural analysis to trace variable propagation and employs a gated graph neural network to capture long-range data flow dependencies for variable type recovery.
arXiv Detail & Related papers (2025-03-10T12:27:05Z)
A Progressive Transformer for Unifying Binary Code Embedding and Knowledge Transfer [15.689556592544667]
We introduce ProTST, a novel transformer-based methodology for binary code embedding.<n>ProTST employs a hierarchical training process based on a unique tree-like structure.<n>Results show that ProTST yields an average validation score (F1, MRR, and Recall@1) improvement of 14.8% compared to traditional two-stage training.
arXiv Detail & Related papers (2024-12-15T13:04:29Z)
Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification. In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction. Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z)
Analysing the Behaviour of Tree-Based Neural Networks in Regression Tasks [3.912345988363511]
This paper endeavours to decode the behaviour of tree-based neural network models in the context of regression challenges. We extend the application of established models--tree-based CNNs, Code2Vec, and Transformer-based methods--to predict the execution time of source code by parsing it to an AST. Our proposed dual transformer demonstrates remarkable adaptability and robust performance across diverse datasets.
arXiv Detail & Related papers (2024-06-17T11:47:14Z)
Transformer-based approaches to Sentiment Detection [55.41644538483948]
We examined the performance of four different types of state-of-the-art transformer models for text classification. The RoBERTa transformer model performs best on the test dataset with a score of 82.6% and is highly recommended for quality predictions.
arXiv Detail & Related papers (2023-03-13T17:12:03Z)
UniASM: Binary Code Similarity Detection without Fine-tuning [2.2329530239800035]
We propose a novel rich-semantic function representation technique to ensure the model captures the intricate nuances of binary code.<n>We introduce the first UniLM-based binary code embedding model, named UniASM, which includes two newly designed training tasks.<n>The experimental results show that UniASM outperforms the state-of-the-art (SOTA) approaches on the evaluation datasets.
arXiv Detail & Related papers (2022-10-28T14:04:57Z)
Learning Transferable Adversarial Robust Representations via Multi-view Consistency [57.73073964318167]
We propose a novel meta-adversarial multi-view representation learning framework with dual encoders. We demonstrate the effectiveness of our framework on few-shot learning tasks from unseen domains.
arXiv Detail & Related papers (2022-10-19T11:48:01Z)
Bias-Variance Tradeoffs in Single-Sample Binary Gradient Estimators [100.58924375509659]
Straight-through (ST) estimator gained popularity due to its simplicity and efficiency. Several techniques were proposed to improve over ST while keeping the same low computational complexity. We conduct a theoretical analysis of Bias and Variance of these methods in order to understand tradeoffs and verify originally claimed properties.
arXiv Detail & Related papers (2021-10-07T15:16:07Z)
Semantic-aware Binary Code Representation with BERT [27.908093567605484]
A wide range of binary analysis applications, such as bug discovery, malware analysis and code clone detection, require recovery of contextual meanings on a binary code. Recently, binary analysis techniques based on machine learning have been proposed to automatically reconstruct the code representation of a binary. In this paper, we propose DeepSemantic utilizing BERT in producing the semantic-aware code representation of a binary code.
arXiv Detail & Related papers (2021-06-10T03:31:29Z)
Certified Robustness to Programmable Transformations in LSTMs [14.587069421684157]
Deep neural networks for natural language processing are fragile in the face of adversarial examples. We present an approach to certifying LSTMs of extensions LSTMs that can be efficiently certified.
arXiv Detail & Related papers (2021-02-15T19:54:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.