Related papers: A High-level Synthesis Toolchain for the Julia Language

A High-level Synthesis Toolchain for the Julia Language

URL: http://arxiv.org/abs/2512.15679v1
Date: Wed, 17 Dec 2025 18:32:06 GMT
Title: A High-level Synthesis Toolchain for the Julia Language
Authors: Benedict Short, Ian McInerney, John Wickerson,
Abstract summary: We propose a new MLIR-based compiler toolchain that unifies the development process by automatically compiling kernels written in the Julia programming language into SystemVerilog.<n>Our toolchain supports both dynamic and static scheduling, directly integrates with the AXI4-Stream protocol, and generates vendor-agnostic RTL.<n>This prototype toolchain is able to synthesize a set of signal processing/mathematical benchmarks that can operate at 100MHz on real FPGA devices.
Score: 1.7995266833057173
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the push towards Exascale computing and data-driven methods, problem sizes have increased dramatically, increasing the computational requirements of the underlying algorithms. This has led to a push to offload computations to general purpose hardware accelerators such as GPUs and TPUs, and a renewed interest in designing problem-specific accelerators using FPGAs. However, the development process of these problem-specific accelerators currently suffers from the "two-language problem": algorithms are developed in one (usually higher-level) language, but the kernels are implemented in another language at a completely different level of abstraction and requiring fundamentally different expertise. To address this problem, we propose a new MLIR-based compiler toolchain that unifies the development process by automatically compiling kernels written in the Julia programming language into SystemVerilog without the need for any additional directives or language customisations. Our toolchain supports both dynamic and static scheduling, directly integrates with the AXI4-Stream protocol to interface with subsystems like on- and off-chip memory, and generates vendor-agnostic RTL. This prototype toolchain is able to synthesize a set of signal processing/mathematical benchmarks that can operate at 100MHz on real FPGA devices, achieving between 59.71% and 82.6% of the throughput of designs generated by state-of-the-art toolchains that only compile from low-level languages like C or C++. Overall, this toolchain allows domain experts to write compute kernels in Julia as they normally would, and then retarget them to an FPGA without additional pragmas or modifications.

Related papers

Understanding Accelerator Compilers via Performance Profiling [1.1841612917872066]
Accelerator design languages (ADLs) are high-level languages that compile to hardware units.<n>We introduce Petal, a cycle-level tool for understanding how the compiler's decisions affect performance.<n>We show that Petal's cycle-level profiles can identify performance problems in existing designs.
arXiv Detail & Related papers (2025-11-24T22:40:11Z)
NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding [54.88765757043535]
This work rethinks data structures for statistical n-gram language models to enable fast and parallel operations for GPU-optimized inference.<n>Our approach, named NGPU-LM, introduces customizable greedy decoding for all major ASR model types with less than 7% computational overhead.<n>The proposed approach can eliminate more than 50% of the accuracy gap between greedy and beam search for out-of-domain scenarios while avoiding significant slowdown caused by beam search.
arXiv Detail & Related papers (2025-05-28T20:43:10Z)
Hardware.jl - An MLIR-based Julia HLS Flow (Work in Progress) [1.2152813244704233]
We are developing a reusable end-to-end compiler toolchain for the Julia language.<n>This unifies accelerator and algorithm development by automatically synthesising Julia source code into high-performance Verilog.
arXiv Detail & Related papers (2025-03-12T15:12:12Z)
Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts. Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness. Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z)
Evaluation of OpenAI Codex for HPC Parallel Programming Models Kernel Generation [1.7646846505225735]
We evaluate AI-assisted generative capabilities on fundamental numerical kernels in high-performance computing. We test the generated kernel codes for a variety of language-supported programming models. We propose a proficiency metric around the initial 10 suggestions given for each prompt.
arXiv Detail & Related papers (2023-06-27T00:11:31Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)
QParallel: Explicit Parallelism for Programming Quantum Computers [62.10004571940546]
We present a language extension for parallel quantum programming. QParallel removes ambiguities concerning parallelism in current quantum programming languages. We introduce a tool that guides programmers in the placement of parallel regions by identifying the subroutines that profit most from parallelization.
arXiv Detail & Related papers (2022-10-07T16:35:16Z)
Enabling Retargetable Optimizing Compilers for Quantum Accelerators via a Multi-Level Intermediate Representation [78.8942067357231]
We present a multi-level quantum-classical intermediate representation (IR) that enables an optimizing, retargetable, ahead-of-time compiler. We support the entire gate-based OpenQASM 3 language and provide custom extensions for common quantum programming patterns and improved syntax. Our work results in compile times that are 1000x faster than standard Pythonic approaches, and 5-10x faster than comparative standalone quantum language compilers.
arXiv Detail & Related papers (2021-09-01T17:29:47Z)
StreamBlocks: A compiler for heterogeneous dataflow computing (technical report) [1.5293427903448022]
This work introduces StreamBlocks, an open-source compiler and runtime that uses the CAL dataflow programming language to partition computations across platforms. StreamBlocks supports exploring the design space with a profile-guided tool that helps identify the best hardware-software partitions.
arXiv Detail & Related papers (2021-07-20T08:46:47Z)
UNIT: Unifying Tensorized Instruction Compilation [11.193044425743981]
Hardware vendors offer tensorized instructions for mixed-precision operations, like Intel VNNI, Core, and ARM-DOT. The lack of compilation techniques for this makes it hard to utilize these instructions. We develop a compiler framework to unify the compilation for these instructions.
arXiv Detail & Related papers (2021-01-21T06:22:58Z)
Extending C++ for Heterogeneous Quantum-Classical Computing [56.782064931823015]
qcor is a language extension to C++ and compiler implementation that enables heterogeneous quantum-classical programming, compilation, and execution in a single-source context. Our work provides a first-of-its-kind C++ compiler enabling high-level quantum kernel (function) expression in a quantum-language manner.
arXiv Detail & Related papers (2020-10-08T12:49:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.