Related papers: DSP-MLIR: A MLIR Dialect for Digital Signal Processing

DSP-MLIR: A MLIR Dialect for Digital Signal Processing

URL: http://arxiv.org/abs/2408.11205v1
Date: Tue, 20 Aug 2024 21:33:17 GMT
Title: DSP-MLIR: A MLIR Dialect for Digital Signal Processing
Authors: Abhinav Kumar, Atharva Khedkar, Aviral Shrivastava,
Abstract summary: In this paper, we utilize MLIR framework to introduce a DSP Dialect and perform domain-specific optimizations at dialect -level ( high-level ) We show the performance improvement in execution time for these sample apps by upto 10x which would have been difficult if the IR were at C/ affine level.
Score: 3.1688509302874652
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Traditional Digital Signal Processing ( DSP ) compilers work at low level ( C-level / assembly level ) and hence lose much of the optimization opportunities present at high-level ( domain-level ). The emerging multi-level compiler infrastructure MLIR ( Multi-level Intermediate Representation ) allows to specify optimizations at higher level. In this paper, we utilize MLIR framework to introduce a DSP Dialect and perform domain-specific optimizations at dialect -level ( high-level ) and show the usefulness of these optimizations on sample DSP apps. In particular, we develop a compiler for DSP and a DSL (Domain Specific Language) to ease the development of apps. We show the performance improvement in execution time for these sample apps by upto 10x which would have been difficult if the IR were at C/ affine level.

Related papers

Retrieval-Augmented Fine-Tuning With Preference Optimization For Visual Program Generation [3.271964502485636]
We propose a two-stage training strategy to enhance visual programming languages (VPLs) generation. First, we employ retrieval-augmented fine-tuning to leverage the repetitive use of subroutines commonly seen in industrial VPLs. Second, we apply direct preference optimization (DPO) to further guide the model toward accurate outputs.
arXiv Detail & Related papers (2025-02-23T10:27:44Z)
LLM-Aided Compilation for Tensor Accelerators [6.709490736813537]
We discuss how large language models (LLMs) could be leveraged to build a compiler for hardware accelerators. Specifically, we demonstrate the ability of GPT-4 to achieve high pass rates in translating code to the Gemmini accelerator. We also propose a 2-phase workflow for utilizing LLMs to generate hardware-optimized code.
arXiv Detail & Related papers (2024-08-06T19:10:25Z)
Two Optimizers Are Better Than One: LLM Catalyst Empowers Gradient-Based Optimization for Prompt Tuning [69.95292905263393]
We show that gradient-based optimization and large language models (MsLL) are complementary to each other, suggesting a collaborative optimization approach. Our code is released at https://www.guozix.com/guozix/LLM-catalyst.
arXiv Detail & Related papers (2024-05-30T06:24:14Z)
mlirSynth: Automatic, Retargetable Program Raising in Multi-Level IR using Program Synthesis [48.01697184432969]
mlirSynth translates programs from lower-level MLIR dialects to high-level ones without manually defined rules. We demonstrate its effectiveness reviby raising C programs to two distinct high-level MLIR dialects, which enables us to use existing high-level dialect specific compilation flows.
arXiv Detail & Related papers (2023-10-06T12:21:50Z)
SEER: Super-Optimization Explorer for HLS using E-graph Rewriting with MLIR [0.3124884279860061]
High-level synthesis (HLS) is a process that automatically translates a software program in a high-level language into a low-level hardware description. We propose a super-optimization approach for HLS that automatically rewrites an arbitrary software program into HLS efficient code. We show that SEER achieves up to 38x the performance within 1.4x the area of the original program.
arXiv Detail & Related papers (2023-08-15T09:05:27Z)
A Meta-Learning Based Precoder Optimization Framework for Rate-Splitting Multiple Access [53.191806757701215]
We propose the use of a meta-learning based precoder optimization framework to directly optimize the Rate-Splitting Multiple Access (RSMA) precoders with partial Channel State Information at the Transmitter (CSIT) By exploiting the overfitting of the compact neural network to maximize the explicit Average Sum-Rate (ASR) expression, we effectively bypass the need for any other training data while minimizing the total running time. Numerical results reveal that the meta-learning based solution achieves similar ASR performance to conventional precoder optimization in medium-scale scenarios, and significantly outperforms sub-optimal low complexity precoder algorithms in the large-scale
arXiv Detail & Related papers (2023-07-17T20:31:41Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)
Enabling Retargetable Optimizing Compilers for Quantum Accelerators via a Multi-Level Intermediate Representation [78.8942067357231]
We present a multi-level quantum-classical intermediate representation (IR) that enables an optimizing, retargetable, ahead-of-time compiler. We support the entire gate-based OpenQASM 3 language and provide custom extensions for common quantum programming patterns and improved syntax. Our work results in compile times that are 1000x faster than standard Pythonic approaches, and 5-10x faster than comparative standalone quantum language compilers.
arXiv Detail & Related papers (2021-09-01T17:29:47Z)
Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations [15.659251804042748]
Woodpecker-DL (WPK) is a hardware-aware deep learning framework. WPK uses graph optimization, automated searches, domain-specific language ( DSL) and system-level exploration to accelerate inference. We show that on a maximum P100 GPU, we can achieve the speedup of 5.40 over cuDNN and 1.63 over TVM on individual operators, and run up to 1.18 times faster than TeslaRT for end-to-end model inference.
arXiv Detail & Related papers (2020-08-11T07:50:34Z)
PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives. We develop novel data reuse analysis algorithms using the polyhedral model. We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.