Fugu-MT 論文翻訳(概要): Fool Me If You Can: On the Robustness of Binary Code Similarity Detection Models against Semantics-preserving Transformations

論文の概要: Fool Me If You Can: On the Robustness of Binary Code Similarity Detection Models against Semantics-preserving Transformations

arxiv url: http://arxiv.org/abs/2602.12681v1
Date: Fri, 13 Feb 2026 07:23:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-16 23:37:53.878799
Title: Fool Me If You Can: On the Robustness of Binary Code Similarity Detection Models against Semantics-preserving Transformations
Title（参考訳）: 2値コード類似性検出モデルのセマンティックス保存変換に対するロバスト性について
Authors: Jiyong Uhm, Minseok Kim, Michalis Polychronakis, Hyungjoon Koo,
Abstract要約: バイナリコード類似性検出作業におけるディープラーニングモデルの堅牢性を評価する。 620のベースラインサンプルから9,565のバイナリ変数のデータセットを構築した。
参考スコア（独自算出の注目度）: 7.222996408214315
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Binary code analysis plays an essential role in cybersecurity, facilitating reverse engineering to reveal the inner workings of programs in the absence of source code. Traditional approaches, such as static and dynamic analysis, extract valuable insights from stripped binaries, but often demand substantial expertise and manual effort. Recent advances in deep learning have opened promising opportunities to enhance binary analysis by capturing latent features and disclosing underlying code semantics. Despite the growing number of binary analysis models based on machine learning, their robustness to adversarial code transformations at the binary level remains underexplored. We evaluate the robustness of deep learning models for the task of binary code similarity detection (BCSD) under semantics-preserving transformations. The unique nature of machine instructions presents distinct challenges compared to the typical input perturbations found in other domains. We introduce asmFooler, a system that evaluates the resilience of BCSD models using a diverse set of adversarial code transformations that preserve functional semantics. We construct a dataset of 9,565 binary variants from 620 baseline samples by applying eight semantics-preserving transformations across six representative BCSD models. Our major findings highlight several key insights: i) model robustness relies on the processing pipeline, including code pre-processing, architecture, and feature selection; ii) adversarial transformation effectiveness is bounded by a budget shaped by model-specific constraints like input size and instruction expressive capacity; iii) well-crafted transformations can be highly effective with minimal perturbations; and iv) such transformations efficiently disrupt model decisions (e.g., misleading to false positives or false negatives) by focusing on semantically significant instructions.
Abstract（参考訳）: バイナリコード分析はサイバーセキュリティにおいて重要な役割を担い、リバースエンジニアリングによってソースコードがないプログラムの内部動作を明らかにする。静的および動的解析のような伝統的なアプローチは、取り除かれたバイナリから貴重な洞察を抽出するが、しばしばかなりの専門知識と手作業を必要とする。ディープラーニングの最近の進歩は、潜伏した特徴をキャプチャし、基礎となるコードセマンティクスを開示することによってバイナリ分析を強化する、有望な機会を開いた。機械学習に基づくバイナリ分析モデルの増加にもかかわらず、バイナリレベルでの逆コード変換に対する堅牢性はいまだ検討されていない。セマンティクス保存変換に基づくバイナリコード類似度検出(BCSD)タスクにおけるディープラーニングモデルの堅牢性を評価する。機械命令の独特な性質は、他の領域で見られる典型的な入力摂動と異なる課題を示す。本稿では,機能的セマンティクスを保存した多種多様な逆コード変換を用いて,BCGモデルのレジリエンスを評価するシステムであるasmFoolerを紹介する。 6つのBCGモデルに8つのセマンティックス保存変換を適用することにより、620のベースラインサンプルから9,565のバイナリ変数のデータセットを構築した。私たちの主要な発見は、いくつかの重要な洞察を浮き彫りにした。一モデル堅牢性は、コード前処理、アーキテクチャ、特徴選択を含む処理パイプラインに依存している。二逆変換の有効性は、入力サイズ及び指示表現能力等のモデル固有の制約により形づくられた予算により制限される。三熟練した変換は、最小限の摂動で非常に効果的であることができる。四このような変換は、意味的に重要な指示に焦点を当てて、モデル決定(例えば、偽陽性又は偽陰性)を効果的に妨害する。

論文の概要: Fool Me If You Can: On the Robustness of Binary Code Similarity Detection Models against Semantics-preserving Transformations

関連論文リスト