Automatically Mitigating Vulnerabilities in Binary Programs via
Partially Recompilable Decompilation
- URL: http://arxiv.org/abs/2202.12336v2
- Date: Mon, 12 Jun 2023 16:28:54 GMT
- Title: Automatically Mitigating Vulnerabilities in Binary Programs via
Partially Recompilable Decompilation
- Authors: Pemma Reiter, Hui Jun Tay, Westley Weimer, Adam Doup\'e, Ruoyu Wang,
Stephanie Forrest
- Abstract summary: We propose Partially Recompilable Decompilation (PRD)
PRD lifts suspect binary functions to source, available for analysis, revision, or review, and creates a patched binary.
We evaluate PRD in two contexts: a fully automated process incorporating source-level Automated Program Repair (APR) methods; human-edited source-level repairs.
- Score: 8.31538179550799
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Vulnerabilities are challenging to locate and repair, especially when source
code is unavailable and binary patching is required. Manual methods are
time-consuming, require significant expertise, and do not scale to the rate at
which new vulnerabilities are discovered. Automated methods are an attractive
alternative, and we propose Partially Recompilable Decompilation (PRD). PRD
lifts suspect binary functions to source, available for analysis, revision, or
review, and creates a patched binary using source- and binary-level techniques.
Although decompilation and recompilation do not typically work on an entire
binary, our approach succeeds because it is limited to a few functions, like
those identified by our binary fault localization.
We evaluate these assumptions and find that, without any grammar or
compilation restrictions, 70-89% of individual functions are successfully
decompiled and recompiled with sufficient type recovery. In comparison, only
1.7% of the full C-binaries succeed. When decompilation succeeds, PRD produces
test-equivalent binaries 92.9% of the time.
In addition, we evaluate PRD in two contexts: a fully automated process
incorporating source-level Automated Program Repair (APR) methods; human-edited
source-level repairs. When evaluated on DARPA Cyber Grand Challenge (CGC)
binaries, we find that PRD-enabled APR tools, operating only on binaries,
performs as well as, and sometimes better than full-source tools, collectively
mitigating 85 of the 148 scenarios, a success rate consistent with these same
tools operating with access to the entire source code. PRD achieves similar
success rates as the winning CGC entries, sometimes finding higher-quality
mitigations than those produced by top CGC teams. For generality, our
evaluation includes two independently developed APR tools and C++, Rode0day,
and real-world binaries.
Related papers
- Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries [2.696054049278301]
We introduce DeBinVul, a novel decompiled binary code vulnerability dataset.
We fine-tune state-of-the-art LLMs using DeBinVul and report on a performance increase of 19%, 24%, and 21% in detecting binary code vulnerabilities.
arXiv Detail & Related papers (2024-11-07T18:54:31Z) - Tractable Offline Learning of Regular Decision Processes [50.11277112628193]
This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs)
Ins, the unknown dependency of future observations and rewards from the past interactions can be captured experimentally.
Many algorithms first reconstruct this unknown dependency using automata learning techniques.
arXiv Detail & Related papers (2024-09-04T14:26:58Z) - The Power of Resets in Online Reinforcement Learning [73.64852266145387]
We explore the power of simulators through online reinforcement learning with local simulator access (or, local planning)
We show that MDPs with low coverability can be learned in a sample-efficient fashion with only $Qstar$-realizability.
We show that the notorious Exogenous Block MDP problem is tractable under local simulator access.
arXiv Detail & Related papers (2024-04-23T18:09:53Z) - SERL: A Software Suite for Sample-Efficient Robotic Reinforcement
Learning [85.21378553454672]
We develop a library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment.
We find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation.
These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent robustness recovery and correction behaviors.
arXiv Detail & Related papers (2024-01-29T10:01:10Z) - BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching [8.655595404611821]
We introduce BinaryAI, a novel binary-to-source SCA technique with two-phase binary source code matching to capture both syntactic and semantic code features.
Our experimental results demonstrate the superior performance of BinaryAI in terms of binary source code matching and the downstream SCA task.
arXiv Detail & Related papers (2024-01-20T07:57:57Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - Boosting Neural Networks to Decompile Optimized Binaries [13.255618541522436]
Decompilation aims to transform a low-level program language (LPL) into its functionally-equivalent high-level program language (HPL)
We propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries.
arXiv Detail & Related papers (2023-01-03T06:45:54Z) - Evolving Pareto-Optimal Actor-Critic Algorithms for Generalizability and
Stability [67.8426046908398]
Generalizability and stability are two key objectives for operating reinforcement learning (RL) agents in the real world.
This paper presents MetaPG, an evolutionary method for automated design of actor-critic loss functions.
arXiv Detail & Related papers (2022-04-08T20:46:16Z) - Break-It-Fix-It: Unsupervised Learning for Program Repair [90.55497679266442]
We propose a new training approach, Break-It-Fix-It (BIFI), which has two key ideas.
We use the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data.
Based on these ideas, we iteratively update the breaker and the fixer while using them in conjunction to generate more paired data.
BIFI outperforms existing methods, obtaining 90.5% repair accuracy on GitHub-Python and 71.7% on DeepFix.
arXiv Detail & Related papers (2021-06-11T20:31:04Z) - Semantic-aware Binary Code Representation with BERT [27.908093567605484]
A wide range of binary analysis applications, such as bug discovery, malware analysis and code clone detection, require recovery of contextual meanings on a binary code.
Recently, binary analysis techniques based on machine learning have been proposed to automatically reconstruct the code representation of a binary.
In this paper, we propose DeepSemantic utilizing BERT in producing the semantic-aware code representation of a binary code.
arXiv Detail & Related papers (2021-06-10T03:31:29Z) - Improving type information inferred by decompilers with supervised
machine learning [0.0]
In software reverse engineering, decompilation is the process of recovering source code from binary files.
We build different classification models capable of inferring the high-level type returned by functions.
Our system is able to predict function return types with a 79.1% F1-measure, whereas the best decompiler obtains a 30% F1-measure.
arXiv Detail & Related papers (2021-01-19T11:45:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.