Designing and Implementing a Generator Framework for a SIMD Abstraction Library
- URL: http://arxiv.org/abs/2407.18728v1
- Date: Fri, 26 Jul 2024 13:25:38 GMT
- Title: Designing and Implementing a Generator Framework for a SIMD Abstraction Library
- Authors: Johannes Pietrzyk, Alexander Krause, Dirk Habich, Wolfgang Lehner,
- Abstract summary: We present TSLGen, a novel end-to-end framework for generating an SIMD abstraction library.
We show that our framework is comparable to existing libraries, and we achieve the same performance results.
- Score: 53.84310825081338
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Single Instruction Multiple Data (SIMD) parallel paradigm is a well-established and heavily-used hardware-driven technique to increase the single-thread performance in different system domains such as database or machine learning. Depending on the hardware vendor and the specific processor generation/version, SIMD capabilities come in different flavors concerning the register size and the supported SIMD instructions. Due to this heterogeneity and the lack of standardized calling conventions, building high-performance and portable systems is a challenging task. To address this challenge, academia and industry have invested a remarkable effort into creating SIMD abstraction libraries that provide unified access to different SIMD hardware capabilities. However, those one-size-fits-all library approaches are inherently complex, which hampers maintainability and extensibility. Furthermore, they assume similar SIMD hardware designs, which may be invalidated through ARM SVE's emergence. Additionally, while existing SIMD abstraction libraries do a great job of hiding away the specifics of the underlying hardware, their lack of expressiveness impedes crucial algorithm design decisions for system developers. To overcome these limitations, we present TSLGen, a novel end-to-end framework approach for generating an SIMD abstraction library in this paper. We have implemented our TSLGen framework and used our generated Template SIMD Library (TSL) to program various system components from different domains. As we will show, the programming effort is comparable to existing libraries, and we achieve the same performance results. However, our framework is easy to maintain and to extend, which simultaneously supports disruptive changes to the interface by design and exposes valuable insights for assessing provided functionality.
Related papers
- Scalable, Tokenization-Free Diffusion Model Architectures with Efficient Initial Convolution and Fixed-Size Reusable Structures for On-Device Image Generation [0.0]
Vision Transformers and U-Net architectures have been widely adopted in the implementation of Diffusion Models.
We propose an architecture that utilizes a fixed-size, reusable transformer block as a core structure.
Our architecture is characterized by low complexity, token-free design, absence of positional embeddings, uniformity, and scalability.
arXiv Detail & Related papers (2024-11-09T08:58:57Z) - AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation [48.82264764771652]
We introduce AsCAN -- a hybrid architecture, combining both convolutional and transformer blocks.
AsCAN supports a variety of tasks: recognition, segmentation, class-conditional image generation.
We then scale the same architecture to solve a large-scale text-to-image task and show state-of-the-art performance.
arXiv Detail & Related papers (2024-11-07T18:43:17Z) - MILP-StuDio: MILP Instance Generation via Block Structure Decomposition [55.79888361191114]
Mixed-integer linear programming (MILP) is one of the most popular mathematical formulations with numerous applications.
We propose a novel MILP generation framework, called Block Structure Decomposition (MILP-StuDio), to generate high-quality instances by preserving the block structures.
arXiv Detail & Related papers (2024-10-30T08:33:27Z) - CARLOS: An Open, Modular, and Scalable Simulation Framework for the Development and Testing of Software for C-ITS [0.0]
We propose CARLOS - an open, modular, and scalable simulation framework for the development and testing of software in C-ITS.
We provide core building blocks for this framework and explain how it can be used and extended by the community.
In our paper, we motivate the architecture by describing important design principles and showcasing three major use cases.
arXiv Detail & Related papers (2024-04-02T10:48:36Z) - Using the Abstract Computer Architecture Description Language to Model
AI Hardware Accelerators [77.89070422157178]
Manufacturers of AI-integrated products face a critical challenge: selecting an accelerator that aligns with their product's performance requirements.
The Abstract Computer Architecture Description Language (ACADL) is a concise formalization of computer architecture block diagrams.
In this paper, we demonstrate how to use the ACADL to model AI hardware accelerators, use their ACADL description to map DNNs onto them, and explain the timing simulation semantics to gather performance results.
arXiv Detail & Related papers (2024-01-30T19:27:16Z) - LILO: Learning Interpretable Libraries by Compressing and Documenting Code [71.55208585024198]
We introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code.
LILO combines LLM-guided program synthesis with recent algorithmic advances in automated from Stitch.
We find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions.
arXiv Detail & Related papers (2023-10-30T17:55:02Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Virtualization of Tiny Embedded Systems with a robust real-time capable
and extensible Stack Virtual Machine REXAVM supporting Material-integrated
Intelligent Systems and Tiny Machine Learning [0.0]
This paper shows and evaluates the suitability of the proposed VM architecture for operationally equivalent software and hardware (FPGA) implementations.
In a holistic architecture approach, the VM specifically addresses digital signal processing and tiny machine learning.
arXiv Detail & Related papers (2023-02-17T17:13:35Z) - MLIR: A Compiler Infrastructure for the End of Moore's Law [14.795080852112083]
MLIR aims to address software fragmentation, improve compilation for heterogeneous hardware, and significantly reduce the cost of building domain specific compilers.
MLIR facilitates the design and implementation of code generators, translators and translators at different levels of abstraction.
arXiv Detail & Related papers (2020-02-25T17:24:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.