Related papers: SEER: Super-Optimization Explorer for HLS using E-graph Rewriting with MLIR

SEER: Super-Optimization Explorer for HLS using E-graph Rewriting with MLIR

URL: http://arxiv.org/abs/2308.07654v1
Date: Tue, 15 Aug 2023 09:05:27 GMT
Title: SEER: Super-Optimization Explorer for HLS using E-graph Rewriting with MLIR
Authors: Jianyi Cheng, Samuel Coward, Lorenzo Chelini, Rafael Barbalho, Theo Drane
Abstract summary: High-level synthesis (HLS) is a process that automatically translates a software program in a high-level language into a low-level hardware description. We propose a super-optimization approach for HLS that automatically rewrites an arbitrary software program into HLS efficient code. We show that SEER achieves up to 38x the performance within 1.4x the area of the original program.
Score: 0.3124884279860061
License: http://creativecommons.org/licenses/by/4.0/
Abstract: High-level synthesis (HLS) is a process that automatically translates a software program in a high-level language into a low-level hardware description. However, the hardware designs produced by HLS tools still suffer from a significant performance gap compared to manual implementations. This is because the input HLS programs must still be written using hardware design principles. Existing techniques either leave the program source unchanged or perform a fixed sequence of source transformation passes, potentially missing opportunities to find the optimal design. We propose a super-optimization approach for HLS that automatically rewrites an arbitrary software program into efficient HLS code that can be used to generate an optimized hardware design. We developed a toolflow named SEER, based on the e-graph data structure, to efficiently explore equivalent implementations of a program at scale. SEER provides an extensible framework, orchestrating existing software compiler passes and hardware synthesis optimizers. Our work is the first attempt to exploit e-graph rewriting for large software compiler frameworks, such as MLIR. Across a set of open-source benchmarks, we show that SEER achieves up to 38x the performance within 1.4x the area of the original program. Via an Intel-provided case study, SEER demonstrates the potential to outperform manually optimized designs produced by hardware experts.

Related papers

Guided Tensor Lifting [54.10411390218929]
Domain-specific languages (s) for machine learning are revolutionizing the speed and efficiency of machine learning workloads. To take advantage of these capabilities, a user must first translate their legacy code from the language it is currently written in, into the new DSL. Process of automatically lifting code into these DSLs has been identified by several recent works, which propose program synthesis as a solution.
arXiv Detail & Related papers (2025-04-28T12:00:10Z)
HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design Tasks [4.71707720395444]
We introduce HLS-Eval, the first complete benchmark and evaluation framework for HLS-driven design. The benchmark includes 94 unique designs drawn from standard HLS benchmarks and novel sources. Beyond the benchmark, HLS-Eval offers a modular Python framework for automated, parallel evaluation of both local and hosted LLMs.
arXiv Detail & Related papers (2025-04-16T17:30:36Z)
Can Reasoning Models Reason about Hardware? An Agentic HLS Perspective [18.791753740931185]
OpenAI o3-mini and DeepSeek-R1 use enhanced reasoning through Chain-of-Thought (CoT) This paper investigates whether reasoning LLMs can address challenges in High-Level Synthesis (HLS) design space exploration and optimization.
arXiv Detail & Related papers (2025-03-17T01:21:39Z)
RTLRewriter: Methodologies for Large Models aided RTL Code Optimization [21.61206887869307]
This paper introduces RTLRewriter, an innovative framework that leverages large models to optimize RTL code. A circuit partition pipeline is utilized for fast synthesis and efficient rewriting. A specialized search engine is designed to identify useful optimization guides, algorithms, and code snippets.
arXiv Detail & Related papers (2024-09-04T09:59:37Z)
Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis [45.471039079664656]
Domain-specific accelerators (DSAs) have gained popularity for applications such as deep learning and autonomous driving. We propose ProgSG, a model that allows interaction between the source code sequence modality and the graph modality in a deep and fine-grained way. We show that ProgSG reduces the RMSE of design performance predictions by up to $22%$, and identifies designs with an average of $1.10times$.
arXiv Detail & Related papers (2024-06-13T22:34:58Z)
An approach to performance portability through generic programming [0.0]
This work describes a design approach that allows the integration of low-level and verbose programming tools into high-level generic algorithms based on template meta-programming in C++. That allows scientific software to be maintainable and efficient in a period of diversifying hardware in HPC.
arXiv Detail & Related papers (2023-11-08T21:54:43Z)
ChipGPT: How far are we from natural language hardware design [34.22592995908168]
This work attempts to demonstrate an automated design environment that explores LLMs to generate hardware logic designs from natural language specifications. We present a scalable four-stage zero-code logic design framework based on LLMs without retraining or finetuning.
arXiv Detail & Related papers (2023-05-23T12:54:02Z)
ProgSG: Cross-Modality Representation Learning for Programs in Electronic Design Automation [38.023395256208055]
High-level synthesis (HLS) allows a developer to compile a high-level description in the form of software code in C and C++. HLS tools still require microarchitecture decisions, expressed in terms of pragmas. We propose ProgSG allowing the source code sequence modality and the graph modalities to interact with each other in a deep and fine-grained way.
arXiv Detail & Related papers (2023-05-18T09:44:18Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)
Learning Performance-Improving Code Edits [107.21538852090208]
We introduce a framework for adapting large language models (LLMs) to high-level program optimization. First, we curate a dataset of performance-improving edits made by human programmers of over 77,000 competitive C++ programming submission pairs. For prompting, we propose retrieval-based few-shot prompting and chain-of-thought, and for finetuning, these include performance-conditioned generation and synthetic data augmentation based on self-play.
arXiv Detail & Related papers (2023-02-15T18:59:21Z)
Learning to Superoptimize Real-world Programs [79.4140991035247]
We propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models. We introduce the Big Assembly benchmark, a dataset consisting of over 25K real-world functions mined from open-source projects in x86-64 assembly.
arXiv Detail & Related papers (2021-09-28T05:33:21Z)
Enabling Retargetable Optimizing Compilers for Quantum Accelerators via a Multi-Level Intermediate Representation [78.8942067357231]
We present a multi-level quantum-classical intermediate representation (IR) that enables an optimizing, retargetable, ahead-of-time compiler. We support the entire gate-based OpenQASM 3 language and provide custom extensions for common quantum programming patterns and improved syntax. Our work results in compile times that are 1000x faster than standard Pythonic approaches, and 5-10x faster than comparative standalone quantum language compilers.
arXiv Detail & Related papers (2021-09-01T17:29:47Z)
PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives. We develop novel data reuse analysis algorithms using the polyhedral model. We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.