Related papers: A Method for Efficient Heterogeneous Parallel Compilation: A Cryptography Case Study

A Method for Efficient Heterogeneous Parallel Compilation: A Cryptography Case Study

URL: http://arxiv.org/abs/2407.09333v2
Date: Thu, 17 Oct 2024 10:06:57 GMT
Title: A Method for Efficient Heterogeneous Parallel Compilation: A Cryptography Case Study
Authors: Zhiyuan Tan, Liutong Han, Mingjie Xing, Yanjun Wu,
Abstract summary: This paper introduces a novel MLIR-based dialect, named hyper, designed to optimize data management and parallel computation across diverse hardware architectures. We present HETOCompiler, a cryptography-focused compiler prototype that implements multiple hash algorithms and enables their execution on heterogeneous systems.
Score: 8.06660833012594
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the era of diminishing returns from Moores Law, heterogeneous computing systems have emerged as a vital approach to enhance computational efficiency. This paper introduces a novel MLIR-based dialect, named hyper, designed to optimize data management and parallel computation across diverse hardware architectures. The hyper dialect abstracts the complexities of heterogeneous computing by providing a unified compilation framework that efficiently schedules tasks and manages data communication. To demonstrate its capabilities, we present HETOCompiler, a cryptography-focused compiler prototype that implements multiple hash algorithms and enables their execution on heterogeneous systems. The proposed approach achieves performance improvements over existing programming models for heterogeneous computing (OpenCL), offering an average speedup of 1.93x, 1.18x, and 1.12x for SHA-1, MD5, and SM3 algorithms, respectively. Our findings highlight the potential of the hyper dialect in harnessing the full computational power of heterogeneous devices, advancing the field of compiler design for heterogeneous systems.

Related papers

Efficient Compilation for Shuttling Trapped-Ion Machines via the Position Graph Architectural Abstraction [0.9199465050084297]
This work presents a novel unifying abstraction, called the position graph, for different types of hardware architectures.<n>We model trapped-ion Quantum Charge-Coupled Device (QCCD) architectures and enable high-quality, superconducting scalable compilation methods.<n>This approach generates native, executable circuits and ion instructions on the hardware that adheres to the physical constraints of shuttling-based quantum computers.
arXiv Detail & Related papers (2025-01-21T19:39:03Z)
Code Generation for Cryptographic Kernels using Multi-word Modular Arithmetic on GPU [0.5831737970661138]
Homomorphic encryption (FHE) and zero-knowledge proofs (ZKPs) are emerging as solutions for data security in distributed environments. This paper presents a formalization of multi-word modular arithmetic (MoMA), which breaks down large bit-width integer arithmetic into operations on machine words.
arXiv Detail & Related papers (2025-01-13T18:15:44Z)
gECC: A GPU-based high-throughput framework for Elliptic Curve Cryptography [15.39096542261856]
Elliptic Curve Cryptography (ECC) is an encryption method that provides security comparable to traditional techniques like Rivest-Shamir-Adleman (RSA) ECC is still hindered by the significant performance overhead associated with elliptic curve (EC) operations. This paper presents gECC, a versatile framework for ECC optimized for GPU architectures.
arXiv Detail & Related papers (2024-12-22T01:50:50Z)
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE. Our results demonstrate an average 21% improvement in prefill throughput over existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z)
An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks. The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions. We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z)
AxOMaP: Designing FPGA-based Approximate Arithmetic Operators using Mathematical Programming [2.898055875927704]
We propose a data analysis-driven mathematical programming-based approach to synthesizing approximate operators for FPGAs. Specifically, we formulate mixed integer quadratically constrained programs based on the results of correlation analysis of the characterization data. Compared to traditional evolutionary algorithms-based optimization, we report up to 21% improvement in the hypervolume, for joint optimization of PPA and BEHAV.
arXiv Detail & Related papers (2023-09-23T18:23:54Z)
CORE: Common Random Reconstruction for Distributed Optimization with Provable Low Communication Complexity [110.50364486645852]
Communication complexity has become a major bottleneck for speeding up training and scaling up machine numbers. We propose Common Om REOm, which can be used to compress information transmitted between machines.
arXiv Detail & Related papers (2023-09-23T08:45:27Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)
Machine Learning-Driven Adaptive OpenMP For Portable Performance on Heterogeneous Systems [1.885335997132172]
Adapting a program to a new heterogeneous platform is laborious and requires developers to manually explore a vast space of execution parameters. This paper proposes extensions to OpenMP for autonomous, machine learning-driven adaptation. Our solution includes a set of novel language constructs, compiler transformations, and runtime support.
arXiv Detail & Related papers (2023-03-15T18:37:18Z)
Efficient and Sound Differentiable Programming in a Functional Array-Processing Language [4.1779847272994495]
Automatic differentiation (AD) is a technique for computing the derivative of a function represented by a program. We present an AD system for a higher-order functional array-processing language. In combination, computation with forward-mode AD can be as efficient as reverse mode.
arXiv Detail & Related papers (2022-12-20T14:54:47Z)
H2H: Heterogeneous Model to Heterogeneous System Mapping with Computation and Communication Awareness [16.244832640402496]
We propose a novel mapping algorithm with both computation and communication awareness. By slightly trading computation for communication, the system overall latency and energy consumption can be largely reduced. The superior performance of our work is evaluated based on MAESTRO modeling.
arXiv Detail & Related papers (2022-04-29T02:26:18Z)
Enabling Retargetable Optimizing Compilers for Quantum Accelerators via a Multi-Level Intermediate Representation [78.8942067357231]
We present a multi-level quantum-classical intermediate representation (IR) that enables an optimizing, retargetable, ahead-of-time compiler. We support the entire gate-based OpenQASM 3 language and provide custom extensions for common quantum programming patterns and improved syntax. Our work results in compile times that are 1000x faster than standard Pythonic approaches, and 5-10x faster than comparative standalone quantum language compilers.
arXiv Detail & Related papers (2021-09-01T17:29:47Z)
Parallel Scheduling Self-attention Mechanism: Generalization and Optimization [0.76146285961466]
We propose a general scheduling algorithm, which is derived from the optimum scheduling for small instances solved by a satisfiability checking(SAT) solver. Strategies for further optimization on skipping redundant computations are put forward as well, with which reductions of almost 25% and 50% of the original computations are respectively achieved. The proposed algorithms are applicable regardless of problem sizes, as long as the number of input vectors is divisible to the number of computing units available in the architecture.
arXiv Detail & Related papers (2020-12-02T12:04:16Z)
Extending C++ for Heterogeneous Quantum-Classical Computing [56.782064931823015]
qcor is a language extension to C++ and compiler implementation that enables heterogeneous quantum-classical programming, compilation, and execution in a single-source context. Our work provides a first-of-its-kind C++ compiler enabling high-level quantum kernel (function) expression in a quantum-language manner.
arXiv Detail & Related papers (2020-10-08T12:49:07Z)
Iterative Algorithm Induced Deep-Unfolding Neural Networks: Precoding Design for Multiuser MIMO Systems [59.804810122136345]
We propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed. An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed. We show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.
arXiv Detail & Related papers (2020-06-15T02:57:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.