SEER: Super-Optimization Explorer for HLS using E-graph Rewriting with
MLIR
- URL: http://arxiv.org/abs/2308.07654v1
- Date: Tue, 15 Aug 2023 09:05:27 GMT
- Title: SEER: Super-Optimization Explorer for HLS using E-graph Rewriting with
MLIR
- Authors: Jianyi Cheng, Samuel Coward, Lorenzo Chelini, Rafael Barbalho, Theo
Drane
- Abstract summary: High-level synthesis (HLS) is a process that automatically translates a software program in a high-level language into a low-level hardware description.
We propose a super-optimization approach for HLS that automatically rewrites an arbitrary software program into HLS efficient code.
We show that SEER achieves up to 38x the performance within 1.4x the area of the original program.
- Score: 0.3124884279860061
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-level synthesis (HLS) is a process that automatically translates a
software program in a high-level language into a low-level hardware
description. However, the hardware designs produced by HLS tools still suffer
from a significant performance gap compared to manual implementations. This is
because the input HLS programs must still be written using hardware design
principles.
Existing techniques either leave the program source unchanged or perform a
fixed sequence of source transformation passes, potentially missing
opportunities to find the optimal design. We propose a super-optimization
approach for HLS that automatically rewrites an arbitrary software program into
efficient HLS code that can be used to generate an optimized hardware design.
We developed a toolflow named SEER, based on the e-graph data structure, to
efficiently explore equivalent implementations of a program at scale. SEER
provides an extensible framework, orchestrating existing software compiler
passes and hardware synthesis optimizers.
Our work is the first attempt to exploit e-graph rewriting for large software
compiler frameworks, such as MLIR. Across a set of open-source benchmarks, we
show that SEER achieves up to 38x the performance within 1.4x the area of the
original program. Via an Intel-provided case study, SEER demonstrates the
potential to outperform manually optimized designs produced by hardware
experts.
Related papers
- RTLRewriter: Methodologies for Large Models aided RTL Code Optimization [21.61206887869307]
This paper introduces RTLRewriter, an innovative framework that leverages large models to optimize RTL code.
A circuit partition pipeline is utilized for fast synthesis and efficient rewriting.
A specialized search engine is designed to identify useful optimization guides, algorithms, and code snippets.
arXiv Detail & Related papers (2024-09-04T09:59:37Z) - Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis [45.471039079664656]
Domain-specific accelerators (DSAs) have gained popularity for applications such as deep learning and autonomous driving.
We propose ProgSG, a model that allows interaction between the source code sequence modality and the graph modality in a deep and fine-grained way.
We show that ProgSG reduces the RMSE of design performance predictions by up to $22%$, and identifies designs with an average of $1.10times$.
arXiv Detail & Related papers (2024-06-13T22:34:58Z) - An approach to performance portability through generic programming [0.0]
This work describes a design approach that allows the integration of low-level and verbose programming tools into high-level generic algorithms based on template meta-programming in C++.
That allows scientific software to be maintainable and efficient in a period of diversifying hardware in HPC.
arXiv Detail & Related papers (2023-11-08T21:54:43Z) - ChipGPT: How far are we from natural language hardware design [34.22592995908168]
This work attempts to demonstrate an automated design environment that explores LLMs to generate hardware logic designs from natural language specifications.
We present a scalable four-stage zero-code logic design framework based on LLMs without retraining or finetuning.
arXiv Detail & Related papers (2023-05-23T12:54:02Z) - ProgSG: Cross-Modality Representation Learning for Programs in
Electronic Design Automation [38.023395256208055]
High-level synthesis (HLS) allows a developer to compile a high-level description in the form of software code in C and C++.
HLS tools still require microarchitecture decisions, expressed in terms of pragmas.
We propose ProgSG allowing the source code sequence modality and the graph modalities to interact with each other in a deep and fine-grained way.
arXiv Detail & Related papers (2023-05-18T09:44:18Z) - Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels.
We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion.
We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z) - Learning Performance-Improving Code Edits [107.21538852090208]
We introduce a framework for adapting large language models (LLMs) to high-level program optimization.
First, we curate a dataset of performance-improving edits made by human programmers of over 77,000 competitive C++ programming submission pairs.
For prompting, we propose retrieval-based few-shot prompting and chain-of-thought, and for finetuning, these include performance-conditioned generation and synthetic data augmentation based on self-play.
arXiv Detail & Related papers (2023-02-15T18:59:21Z) - Learning to Superoptimize Real-world Programs [79.4140991035247]
We propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models.
We introduce the Big Assembly benchmark, a dataset consisting of over 25K real-world functions mined from open-source projects in x86-64 assembly.
arXiv Detail & Related papers (2021-09-28T05:33:21Z) - Enabling Retargetable Optimizing Compilers for Quantum Accelerators via
a Multi-Level Intermediate Representation [78.8942067357231]
We present a multi-level quantum-classical intermediate representation (IR) that enables an optimizing, retargetable, ahead-of-time compiler.
We support the entire gate-based OpenQASM 3 language and provide custom extensions for common quantum programming patterns and improved syntax.
Our work results in compile times that are 1000x faster than standard Pythonic approaches, and 5-10x faster than comparative standalone quantum language compilers.
arXiv Detail & Related papers (2021-09-01T17:29:47Z) - PolyDL: Polyhedral Optimizations for Creation of High Performance DL
primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives.
We develop novel data reuse analysis algorithms using the polyhedral model.
We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.