A Method for Efficient Heterogeneous Parallel Compilation: A Cryptography Case Study
- URL: http://arxiv.org/abs/2407.09333v2
- Date: Thu, 17 Oct 2024 10:06:57 GMT
- Title: A Method for Efficient Heterogeneous Parallel Compilation: A Cryptography Case Study
- Authors: Zhiyuan Tan, Liutong Han, Mingjie Xing, Yanjun Wu,
- Abstract summary: This paper introduces a novel MLIR-based dialect, named hyper, designed to optimize data management and parallel computation across diverse hardware architectures.
We present HETOCompiler, a cryptography-focused compiler prototype that implements multiple hash algorithms and enables their execution on heterogeneous systems.
- Score: 8.06660833012594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the era of diminishing returns from Moores Law, heterogeneous computing systems have emerged as a vital approach to enhance computational efficiency. This paper introduces a novel MLIR-based dialect, named hyper, designed to optimize data management and parallel computation across diverse hardware architectures. The hyper dialect abstracts the complexities of heterogeneous computing by providing a unified compilation framework that efficiently schedules tasks and manages data communication. To demonstrate its capabilities, we present HETOCompiler, a cryptography-focused compiler prototype that implements multiple hash algorithms and enables their execution on heterogeneous systems. The proposed approach achieves performance improvements over existing programming models for heterogeneous computing (OpenCL), offering an average speedup of 1.93x, 1.18x, and 1.12x for SHA-1, MD5, and SM3 algorithms, respectively. Our findings highlight the potential of the hyper dialect in harnessing the full computational power of heterogeneous devices, advancing the field of compiler design for heterogeneous systems.
Related papers
- EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE.
Our results demonstrate an average 21% improvement in prefill throughput over existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - AxOMaP: Designing FPGA-based Approximate Arithmetic Operators using
Mathematical Programming [2.898055875927704]
We propose a data analysis-driven mathematical programming-based approach to synthesizing approximate operators for FPGAs.
Specifically, we formulate mixed integer quadratically constrained programs based on the results of correlation analysis of the characterization data.
Compared to traditional evolutionary algorithms-based optimization, we report up to 21% improvement in the hypervolume, for joint optimization of PPA and BEHAV.
arXiv Detail & Related papers (2023-09-23T18:23:54Z) - CORE: Common Random Reconstruction for Distributed Optimization with
Provable Low Communication Complexity [110.50364486645852]
Communication complexity has become a major bottleneck for speeding up training and scaling up machine numbers.
We propose Common Om REOm, which can be used to compress information transmitted between machines.
arXiv Detail & Related papers (2023-09-23T08:45:27Z) - Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels.
We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion.
We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z) - Machine Learning-Driven Adaptive OpenMP For Portable Performance on
Heterogeneous Systems [1.885335997132172]
Adapting a program to a new heterogeneous platform is laborious and requires developers to manually explore a vast space of execution parameters.
This paper proposes extensions to OpenMP for autonomous, machine learning-driven adaptation.
Our solution includes a set of novel language constructs, compiler transformations, and runtime support.
arXiv Detail & Related papers (2023-03-15T18:37:18Z) - Efficient and Sound Differentiable Programming in a Functional
Array-Processing Language [4.1779847272994495]
Automatic differentiation (AD) is a technique for computing the derivative of a function represented by a program.
We present an AD system for a higher-order functional array-processing language.
In combination, computation with forward-mode AD can be as efficient as reverse mode.
arXiv Detail & Related papers (2022-12-20T14:54:47Z) - H2H: Heterogeneous Model to Heterogeneous System Mapping with
Computation and Communication Awareness [16.244832640402496]
We propose a novel mapping algorithm with both computation and communication awareness.
By slightly trading computation for communication, the system overall latency and energy consumption can be largely reduced.
The superior performance of our work is evaluated based on MAESTRO modeling.
arXiv Detail & Related papers (2022-04-29T02:26:18Z) - Enabling Retargetable Optimizing Compilers for Quantum Accelerators via
a Multi-Level Intermediate Representation [78.8942067357231]
We present a multi-level quantum-classical intermediate representation (IR) that enables an optimizing, retargetable, ahead-of-time compiler.
We support the entire gate-based OpenQASM 3 language and provide custom extensions for common quantum programming patterns and improved syntax.
Our work results in compile times that are 1000x faster than standard Pythonic approaches, and 5-10x faster than comparative standalone quantum language compilers.
arXiv Detail & Related papers (2021-09-01T17:29:47Z) - Parallel Scheduling Self-attention Mechanism: Generalization and
Optimization [0.76146285961466]
We propose a general scheduling algorithm, which is derived from the optimum scheduling for small instances solved by a satisfiability checking(SAT) solver.
Strategies for further optimization on skipping redundant computations are put forward as well, with which reductions of almost 25% and 50% of the original computations are respectively achieved.
The proposed algorithms are applicable regardless of problem sizes, as long as the number of input vectors is divisible to the number of computing units available in the architecture.
arXiv Detail & Related papers (2020-12-02T12:04:16Z) - Iterative Algorithm Induced Deep-Unfolding Neural Networks: Precoding
Design for Multiuser MIMO Systems [59.804810122136345]
We propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed.
An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed.
We show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.
arXiv Detail & Related papers (2020-06-15T02:57:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.