Related papers: A High-Level Compiler Integration Approach for Deep Learning Accelerators Supporting Abstraction and Optimization

A High-Level Compiler Integration Approach for Deep Learning Accelerators Supporting Abstraction and Optimization

URL: http://arxiv.org/abs/2507.04828v1
Date: Mon, 07 Jul 2025 09:50:15 GMT
Title: A High-Level Compiler Integration Approach for Deep Learning Accelerators Supporting Abstraction and Optimization
Authors: Samira Ahmadifarsani, Daniel Mueller-Gritschneder, Ulf Schlichtmann,
Abstract summary: We introduce a TVM-based compilation integration approach that targets GEMM-based deep learning accelerators.<n>Our approach abstracts the complexities of compiler integration, enabling seamless integration of accelerators.<n>Our framework is benchmarked on the Gemmini accelerator, demonstrating performance comparable to its specialized manually implemented toolchain.
Score: 1.2828127925625228
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The growing adoption of domain-specific architectures in edge computing platforms for deep learning has highlighted the efficiency of hardware accelerators. However, integrating custom accelerators into modern machine learning (ML) compilers remains a complex challenge due to the need for significant modifications in compilation layers and specialized scheduling techniques. Existing frameworks offer partial solutions and require users to navigate intricate compiler internals. In this paper, we introduce a TVM-based compilation integration approach that targets GEMM-based deep learning accelerators. Our approach abstracts the complexities of compiler integration, enabling seamless integration of accelerators without requiring in-depth knowledge of the underlying compiler. Furthermore, we extend and incorporate design space exploration tools, specifically CoSA, to automate efficient tensor scheduling, accounting for factors such as uneven mapping and double buffering. Our framework is benchmarked on the Gemmini accelerator, demonstrating performance comparable to its specialized manually implemented toolchain.

Related papers

Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts [72.22148263683037]
We study the properties of sparse adapters, which train only a subset of weights in the base neural network, as potential building blocks of modular architectures.<n>First, we propose a simple method for training highly effective sparse adapters, which is conceptually simpler than existing methods in the literature.<n>Next, we investigate the merging properties of these sparse adapters by merging adapters for up to 20 natural language processing tasks.
arXiv Detail & Related papers (2025-07-09T03:25:45Z)
CompileAgent: Automated Real-World Repo-Level Compilation with Tool-Integrated LLM-based Agent System [52.048087777953064]
We propose CompileAgent, an agent framework dedicated to repo-level compilation.<n>CompileAgent integrates five tools and a flow-based agent strategy, enabling interaction with software artifacts for compilation instruction search and error resolution.<n>We show that our method significantly improves the compilation success rate, ranging from 10% to 71%.
arXiv Detail & Related papers (2025-05-07T08:59:14Z)
Semi-supervised Semantic Segmentation with Multi-Constraint Consistency Learning [81.02648336552421]
We propose a Multi-Constraint Consistency Learning approach to facilitate the staged enhancement of the encoder and decoder.<n>Self-adaptive feature masking and noise injection are designed in an instance-specific manner to perturb the features for robust learning of the decoder.<n> Experimental results on Pascal VOC2012 and Cityscapes datasets demonstrate that our proposed MCCL achieves new state-of-the-art performance.
arXiv Detail & Related papers (2025-03-23T03:21:33Z)
LLM-Aided Compilation for Tensor Accelerators [6.709490736813537]
We discuss how large language models (LLMs) could be leveraged to build a compiler for hardware accelerators. Specifically, we demonstrate the ability of GPT-4 to achieve high pass rates in translating code to the Gemmini accelerator. We also propose a 2-phase workflow for utilizing LLMs to generate hardware-optimized code.
arXiv Detail & Related papers (2024-08-06T19:10:25Z)
Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI. As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios. This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z)
Using the Abstract Computer Architecture Description Language to Model AI Hardware Accelerators [77.89070422157178]
Manufacturers of AI-integrated products face a critical challenge: selecting an accelerator that aligns with their product's performance requirements. The Abstract Computer Architecture Description Language (ACADL) is a concise formalization of computer architecture block diagrams. In this paper, we demonstrate how to use the ACADL to model AI hardware accelerators, use their ACADL description to map DNNs onto them, and explain the timing simulation semantics to gather performance results.
arXiv Detail & Related papers (2024-01-30T19:27:16Z)
Composing Complex and Hybrid AI Solutions [52.00820391621739]
We describe an extension of the Acumos system towards enabling the above features for general AI applications. Our extensions include support for more generic components with gRPC/Protobuf interfaces. We provide examples of deployable solutions and their interfaces.
arXiv Detail & Related papers (2022-02-25T08:57:06Z)
Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators [4.055002321981825]
We present a HW-SW co-design ecosystem for spatial accelerators called Union. Our framework allows exploring different algorithms and their mappings on several accelerator cost models. We demonstrate the value of Union for the community with several case studies.
arXiv Detail & Related papers (2021-09-15T16:42:18Z)
Bring Your Own Codegen to Deep Learning Compiler [8.87545486816377]
This paper proposes an open source framework that enables users to only concentrate on the development of their proprietary code generation tools. Our framework provides users flexible and easy-to-use interfaces to partition their models into segments that can be executed on "the best" processors.
arXiv Detail & Related papers (2021-05-03T17:22:25Z)
Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights [18.04657939198617]
This paper provides a comprehensive survey on the efficient execution of sparse and irregular tensor computations of machine learning models on hardware accelerators. It analyzes different hardware designs and acceleration techniques and analyzes them in terms of hardware and execution costs. The takeaways from this paper include: understanding the key challenges in accelerating sparse, irregular-shaped, and quantized tensors.
arXiv Detail & Related papers (2020-07-02T04:08:40Z)
Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM Compiler Stack [1.8337659614890698]
Specialized accelerators for tensor-operations, such as blocked-matrix operations and multi-dimensional convolutions, have been emerged as powerful architecture choices for Deep-Learning computing. The rapid development of frameworks, models, and precision options challenges the adaptability of such tensor-accelerators. programmable tensor accelerators offer a promising alternative by allowing reconfiguration of a virtual architecture that overlays on top of the physical FPGA fabric.
arXiv Detail & Related papers (2020-04-20T10:12:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.