BinSym: Binary-Level Symbolic Execution using Formal Descriptions of Instruction Semantics
- URL: http://arxiv.org/abs/2404.04132v1
- Date: Fri, 5 Apr 2024 14:29:39 GMT
- Title: BinSym: Binary-Level Symbolic Execution using Formal Descriptions of Instruction Semantics
- Authors: Sören Tempel, Tobias Brandt, Christoph Lüth, Rolf Drechsler,
- Abstract summary: BinSym is a framework for symbolic program analysis of software in binary form.
It operates directly on binary code instructions and does not require lifting them to an intermediate representation.
- Score: 2.4576576560952788
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: BinSym is a framework for symbolic program analysis of software in binary form. Contrary to prior work, it operates directly on binary code instructions and does not require lifting them to an intermediate representation (IR). This is achieved by formulating the symbolic semantics on top of a formal description of binary code instruction semantics. By building on existing formal descriptions, BinSym eliminates the manual effort required by prior work to implement transformations to an IR, thereby reducing the margin for errors. Furthermore, BinSym's symbolic semantics can be directly related to the binary code, which improves symbolic execution speed by reducing solver query complexity.
Related papers
- Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - Unsupervised Binary Code Translation with Application to Code Similarity Detection and Vulnerability Discovery [2.022692275087205]
Cross-architecture binary code analysis has become an emerging problem.
Deep learning-based binary analysis has shown promising success.
For some low-resource ISAs, an adequate amount of data is hard to find.
arXiv Detail & Related papers (2024-04-29T18:09:28Z) - Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts.
Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness.
Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z) - Code Representation Pre-training with Complements from Program
Executions [29.148208436656216]
We propose FuzzPretrain to explore the dynamic information of programs revealed by their test cases and embed it into the feature representations of code as complements.
FuzzyPretrain yielded more than 6%/9% mAP improvements on code search over its counterparts trained with only source code or AST.
arXiv Detail & Related papers (2023-09-04T01:57:22Z) - ProgSG: Cross-Modality Representation Learning for Programs in
Electronic Design Automation [38.023395256208055]
High-level synthesis (HLS) allows a developer to compile a high-level description in the form of software code in C and C++.
HLS tools still require microarchitecture decisions, expressed in terms of pragmas.
We propose ProgSG allowing the source code sequence modality and the graph modalities to interact with each other in a deep and fine-grained way.
arXiv Detail & Related papers (2023-05-18T09:44:18Z) - Linear Temporal Logic Modulo Theories over Finite Traces (Extended
Version) [72.38188258853155]
This paper studies Linear Temporal Logic over Finite Traces (LTLf)
proposition letters are replaced with first-order formulas interpreted over arbitrary theories.
The resulting logic, called Satisfiability Modulo Theories (LTLfMT), is semi-decidable.
arXiv Detail & Related papers (2022-04-28T17:57:33Z) - A Natural Language Processing Approach for Instruction Set Architecture
Identification [6.495883501989546]
We introduce character-level features of encoded binaries to identify fine-grained bit patterns inherent to each ISA.
Our approach results in an 8% higher accuracy than the state-of-the-art features based on byte-histograms and byte pattern signatures.
arXiv Detail & Related papers (2022-04-13T19:45:06Z) - Semantic-aware Binary Code Representation with BERT [27.908093567605484]
A wide range of binary analysis applications, such as bug discovery, malware analysis and code clone detection, require recovery of contextual meanings on a binary code.
Recently, binary analysis techniques based on machine learning have been proposed to automatically reconstruct the code representation of a binary.
In this paper, we propose DeepSemantic utilizing BERT in producing the semantic-aware code representation of a binary code.
arXiv Detail & Related papers (2021-06-10T03:31:29Z) - How could Neural Networks understand Programs? [67.4217527949013]
It is difficult to build a model to better understand programs, by either directly applying off-the-shelf NLP pre-training techniques to the source code, or adding features to the model by theshelf.
We propose a novel program semantics learning paradigm, that the model should learn from information composed of (1) the representations which align well with the fundamental operations in operational semantics, and (2) the information of environment transition.
arXiv Detail & Related papers (2021-05-10T12:21:42Z) - High-performance symbolic-numerics via multiple dispatch [52.77024349608834]
Symbolics.jl is an extendable symbolic system which uses dynamic multiple dispatch to change behavior depending on the domain needs.
We show that by formalizing a generic API on actions independent of implementation, we can retroactively add optimized data structures to our system.
We demonstrate the ability to swap between classical term-rewriting simplifiers and e-graph-based term-rewriting simplifiers.
arXiv Detail & Related papers (2021-05-09T14:22:43Z) - Representing Partial Programs with Blended Abstract Semantics [62.20775388513027]
We introduce a technique for representing partially written programs in a program synthesis engine.
We learn an approximate execution model implemented as a modular neural network.
We show that these hybrid neuro-symbolic representations enable execution-guided synthesizers to use more powerful language constructs.
arXiv Detail & Related papers (2020-12-23T20:40:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.