Forklift: An Extensible Neural Lifter
- URL: http://arxiv.org/abs/2404.16041v1
- Date: Mon, 1 Apr 2024 17:27:58 GMT
- Title: Forklift: An Extensible Neural Lifter
- Authors: Jordi Armengol-Estapé, Rodrigo C. O. Rocha, Jackson Woodruff, Pasquale Minervini, Michael F. P. O'Boyle,
- Abstract summary: We propose Forklift, the first neural lifter that learns how to translate assembly to LLVM IR using a token-level encoder-decoder Transformer.
We collect millions of parallel LLVM IR, x86, ARM, and RISC-V programs across compilers and optimization levels to train Forklift and set up an input/output-based accuracy harness.
We evaluate Forklift on two challenging benchmark suites and translate 2.5x more x86 programs than a state-of-the-art hand-written lifter and 4.4x more x86 programs than GPT-4 as well as enabling translation from new ISAs.
- Score: 11.633770744027682
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The escalating demand to migrate legacy software across different Instruction Set Architectures (ISAs) has driven the development of assembly-to-assembly translators to map between their respective assembly languages. However, the development of these tools requires substantial engineering effort. State-of-the-art approaches use lifting, a technique where source assembly code is translated to an architecture-independent intermediate representation (IR) (for example, the LLVM IR) and use a pre-existing compiler to recompile the IR to the target ISA. However, the hand-written rules these lifters employ are sensitive to the particular compiler and optimization level used to generate the code and require significant engineering effort to support each new ISA. We propose Forklift, the first neural lifter that learns how to translate assembly to LLVM IR using a token-level encoder-decoder Transformer. We show how to incrementally add support to new ISAs by fine tuning the assembly encoder and freezing the IR decoder, improving the overall accuracy and efficiency. We collect millions of parallel LLVM IR, x86, ARM, and RISC-V programs across compilers and optimization levels to train Forklift and set up an input/output-based accuracy harness. We evaluate Forklift on two challenging benchmark suites and translate 2.5x more x86 programs than a state-of-the-art hand-written lifter and 4.4x more x86 programs than GPT-4 as well as enabling translation from new ISAs.
Related papers
- ALTA: Compiler-Based Analysis of Transformers [56.76482035060707]
We propose a new programming language called ALTA and a compiler that can map ALTA programs to Transformer weights.
ALTA is inspired by RASP, a language proposed by Weiss et al.
We show how Transformers can represent length-invariant algorithms for computing parity and addition, as well as a solution to the SCAN benchmark of compositional generalization tasks.
arXiv Detail & Related papers (2024-10-23T17:58:49Z) - mlirSynth: Automatic, Retargetable Program Raising in Multi-Level IR
using Program Synthesis [48.01697184432969]
mlirSynth translates programs from lower-level MLIR dialects to high-level ones without manually defined rules.
We demonstrate its effectiveness reviby raising C programs to two distinct high-level MLIR dialects, which enables us to use existing high-level dialect specific compilation flows.
arXiv Detail & Related papers (2023-10-06T12:21:50Z) - Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts.
Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness.
Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z) - SEER: Super-Optimization Explorer for HLS using E-graph Rewriting with
MLIR [0.3124884279860061]
High-level synthesis (HLS) is a process that automatically translates a software program in a high-level language into a low-level hardware description.
We propose a super-optimization approach for HLS that automatically rewrites an arbitrary software program into HLS efficient code.
We show that SEER achieves up to 38x the performance within 1.4x the area of the original program.
arXiv Detail & Related papers (2023-08-15T09:05:27Z) - LegoNN: Building Modular Encoder-Decoder Models [117.47858131603112]
State-of-the-art encoder-decoder models are constructed and trained end-to-end as an atomic unit.
No component of the model can be (re-)used without the others, making it impossible to share parts.
We describe LegoNN, a procedure for building encoder-decoder architectures in a way so that its parts can be applied to other tasks without the need for fine-tuning.
arXiv Detail & Related papers (2022-06-07T14:08:07Z) - Enabling Retargetable Optimizing Compilers for Quantum Accelerators via
a Multi-Level Intermediate Representation [78.8942067357231]
We present a multi-level quantum-classical intermediate representation (IR) that enables an optimizing, retargetable, ahead-of-time compiler.
We support the entire gate-based OpenQASM 3 language and provide custom extensions for common quantum programming patterns and improved syntax.
Our work results in compile times that are 1000x faster than standard Pythonic approaches, and 5-10x faster than comparative standalone quantum language compilers.
arXiv Detail & Related papers (2021-09-01T17:29:47Z) - A MLIR Dialect for Quantum Assembly Languages [78.8942067357231]
We demonstrate the utility of the Multi-Level Intermediate Representation (MLIR) for quantum computing.
We extend MLIR with a new quantum dialect that enables the expression and compilation of common quantum assembly languages.
We leverage a qcor-enabled implementation of the QIR quantum runtime API to enable a retargetable (quantum hardware agnostic) compiler workflow.
arXiv Detail & Related papers (2021-01-27T13:00:39Z) - Instead of Rewriting Foreign Code for Machine Learning, Automatically
Synthesize Fast Gradients [6.09170287691728]
This paper presents Enzyme, a high-performance automatic differentiation (AD) compiler plugin for the LLVM compiler framework.
Enzyme synthesizes gradients for programs written in any language whose compiler targets LLVM intermediate representation (IR)
On a machine-learning focused benchmark suite including Microsoft's ADBench, AD on optimized IR achieves a geometric mean speedup of 4.5x over AD on IR.
arXiv Detail & Related papers (2020-10-04T22:32:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.