Asteria-Pro: Enhancing Deep-Learning Based Binary Code Similarity
Detection by Incorporating Domain Knowledge
- URL: http://arxiv.org/abs/2301.00511v2
- Date: Mon, 22 May 2023 02:01:35 GMT
- Title: Asteria-Pro: Enhancing Deep-Learning Based Binary Code Similarity
Detection by Incorporating Domain Knowledge
- Authors: Shouguo Yang, Chaopeng Dong, Yang Xiao, Yiran Cheng, Zhiqiang Shi, Zhi
Li, and Limin Sun
- Abstract summary: We propose a novel deep learning enhancement architecture by incorporating domain knowledge-based pre-filtration and re-ranking modules.
Asteria-Pro manages to detect 1,482 vulnerable functions with a high precision 91.65%.
- Score: 8.93208472340743
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The widespread code reuse allows vulnerabilities to proliferate among a vast
variety of firmware. There is an urgent need to detect these vulnerable code
effectively and efficiently. By measuring code similarities, AI-based binary
code similarity detection is applied to detecting vulnerable code at scale.
Existing studies have proposed various function features to capture the
commonality for similarity detection. Nevertheless, the significant code
syntactic variability induced by the diversity of IoT hardware architectures
diminishes the accuracy of binary code similarity detection. In our earlier
study and the tool Asteria, we adopt a Tree-LSTM network to summarize function
semantics as function commonality and the evaluation result indicates an
advanced performance. However, it still has utility concerns due to excessive
time costs and inadequate precision while searching for large-scale firmware
bugs.
To this end, we propose a novel deep learning enhancement architecture by
incorporating domain knowledge-based pre-filtration and re-ranking modules, and
we develop a prototype based on Asteria called Asteria-Pro. Pre-filtration
module seeks to eliminates dissimilar functions to boost subsequent deep
learning model calculations, while re-ranking module aims to raises the
rankings of vulnerable functions among candidates generated by deep learning
model. Our evaluation indicates that pre-filtration module cuts the calculation
time by 96.9% and re-ranking improves MRR and Recall by 23.71% and 36.4%. By
incorporating the pre-filtration and re-ranking modules, Asteria-Pro
outperforms existing state-of-the-art approaches in bug search task, by a
significant large margin. We conduct a large-scale real-world firmware bug
search and Asteria-Pro manages to detect 1,482 vulnerable functions with a high
precision 91.65%.
Related papers
- Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification.
In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction.
Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z) - Improved Out-of-Scope Intent Classification with Dual Encoding and Threshold-based Re-Classification [6.975902383951604]
Current methodologies face difficulties with the unpredictable distribution of outliers.
We present the Dual for Threshold-Based Re-Classification (DETER) to address these challenges.
Our model outperforms previous benchmarks, increasing up to 13% and 5% in F1 score for known and unknown intents.
arXiv Detail & Related papers (2024-05-30T11:46:42Z) - Bridging the Gap Between End-to-End and Two-Step Text Spotting [88.14552991115207]
Bridging Text Spotting is a novel approach that resolves the error accumulation and suboptimal performance issues in two-step methods.
We demonstrate the effectiveness of the proposed method through extensive experiments.
arXiv Detail & Related papers (2024-04-06T13:14:04Z) - Code Detection for Hardware Acceleration Using Large Language Models [0.0]
This work presents the first analysis of code detection using large language models (LLMs)
We propose both a preliminary, naive prompt and a novel prompting strategy for code detection.
Results reveal that conventional prompting achieves great precision but poor accuracy (68.8%, 22.3%, and 79.2% for GEMM, convolution, and FFT, respectively) due to a high number of false positives.
Our novel prompting strategy substantially reduces false positives, resulting in excellent overall accuracy (91.1%, 97.9%, and 99.7%, respectively)
arXiv Detail & Related papers (2023-07-19T17:21:58Z) - A Dependable Hybrid Machine Learning Model for Network Intrusion
Detection [1.222622290392729]
We propose a new hybrid model that combines machine learning and deep learning to increase detection rates while securing dependability.
Our method produces excellent results when tested on two datasets, KDDCUP'99 and CIC-MalMem-2022.
arXiv Detail & Related papers (2022-12-08T20:19:27Z) - UniASM: Binary Code Similarity Detection without Fine-tuning [0.8271859911016718]
We propose a novel transformer-based binary code embedding model named UniASM to learn representations of the binary functions.
In the real-world task of known vulnerability search, UniASM outperforms all the current baselines.
arXiv Detail & Related papers (2022-10-28T14:04:57Z) - Clear Memory-Augmented Auto-Encoder for Surface Defect Detection [10.829080460965478]
We propose a clear memory-augmented auto-encoder to repair abnormal foregrounds and preserve clear backgrounds.
A general artificial anomaly generation algorithm is proposed to simulate anomalies that are as realistic and feature-rich as possible.
At last, we propose a novel multi scale feature residual detection method for defect segmentation.
arXiv Detail & Related papers (2022-08-08T02:39:03Z) - Robust and Accurate Object Detection via Adversarial Learning [111.36192453882195]
This work augments the fine-tuning stage for object detectors by exploring adversarial examples.
Our approach boosts the performance of state-of-the-art EfficientDets by +1.1 mAP on the object detection benchmark.
arXiv Detail & Related papers (2021-03-23T19:45:26Z) - Anomaly Detection Based on Selection and Weighting in Latent Space [73.01328671569759]
We propose a novel selection-and-weighting-based anomaly detection framework called SWAD.
Experiments on both benchmark and real-world datasets have shown the effectiveness and superiority of SWAD.
arXiv Detail & Related papers (2021-03-08T10:56:38Z) - Pairwise Supervised Hashing with Bernoulli Variational Auto-Encoder and
Self-Control Gradient Estimator [62.26981903551382]
Variational auto-encoders (VAEs) with binary latent variables provide state-of-the-art performance in terms of precision for document retrieval.
We propose a pairwise loss function with discrete latent VAE to reward within-class similarity and between-class dissimilarity for supervised hashing.
This new semantic hashing framework achieves superior performance compared to the state-of-the-arts.
arXiv Detail & Related papers (2020-05-21T06:11:33Z) - Unsupervised Anomaly Detection with Adversarial Mirrored AutoEncoders [51.691585766702744]
We propose a variant of Adversarial Autoencoder which uses a mirrored Wasserstein loss in the discriminator to enforce better semantic-level reconstruction.
We put forward an alternative measure of anomaly score to replace the reconstruction-based metric.
Our method outperforms the current state-of-the-art methods for anomaly detection on several OOD detection benchmarks.
arXiv Detail & Related papers (2020-03-24T08:26:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.