Related papers: Designing and Implementing a Generator Framework for a SIMD Abstraction Library

Designing and Implementing a Generator Framework for a SIMD Abstraction Library

URL: http://arxiv.org/abs/2407.18728v1
Date: Fri, 26 Jul 2024 13:25:38 GMT
Title: Designing and Implementing a Generator Framework for a SIMD Abstraction Library
Authors: Johannes Pietrzyk, Alexander Krause, Dirk Habich, Wolfgang Lehner,
Abstract summary: We present TSLGen, a novel end-to-end framework for generating an SIMD abstraction library. We show that our framework is comparable to existing libraries, and we achieve the same performance results.
Score: 53.84310825081338
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Single Instruction Multiple Data (SIMD) parallel paradigm is a well-established and heavily-used hardware-driven technique to increase the single-thread performance in different system domains such as database or machine learning. Depending on the hardware vendor and the specific processor generation/version, SIMD capabilities come in different flavors concerning the register size and the supported SIMD instructions. Due to this heterogeneity and the lack of standardized calling conventions, building high-performance and portable systems is a challenging task. To address this challenge, academia and industry have invested a remarkable effort into creating SIMD abstraction libraries that provide unified access to different SIMD hardware capabilities. However, those one-size-fits-all library approaches are inherently complex, which hampers maintainability and extensibility. Furthermore, they assume similar SIMD hardware designs, which may be invalidated through ARM SVE's emergence. Additionally, while existing SIMD abstraction libraries do a great job of hiding away the specifics of the underlying hardware, their lack of expressiveness impedes crucial algorithm design decisions for system developers. To overcome these limitations, we present TSLGen, a novel end-to-end framework approach for generating an SIMD abstraction library in this paper. We have implemented our TSLGen framework and used our generated Template SIMD Library (TSL) to program various system components from different domains. As we will show, the programming effort is comparable to existing libraries, and we achieve the same performance results. However, our framework is easy to maintain and to extend, which simultaneously supports disruptive changes to the interface by design and exposes valuable insights for assessing provided functionality.

Related papers

APE-Bench I: Towards File-level Automated Proof Engineering of Formal Math Libraries [5.227446378450704]
APE-Bench I is the first realistic benchmark built from real-world commit histories of Mathlib4. Eleanstic is a scalable parallel verification infrastructure optimized for proof checking across multiple versions of Mathlib.
arXiv Detail & Related papers (2025-04-27T05:04:02Z)
TrapSIMD: SIMD-Aware Compiler Optimization for 2D Trapped-Ion Quantum Machines [14.239863509836864]
We present FluxTrap, a SIMD-aware compiler framework that establishes a hardware-software co-design interface for TI systems. F FluxTrap reduces execution time by up to $3.82 times$ and improves fidelity by several orders of magnitude.
arXiv Detail & Related papers (2025-04-24T18:49:51Z)
Understanding and Optimizing Multi-Stage AI Inference Pipelines [11.254219071373319]
HERMES is a Heterogeneous Multi-stage LLM inference Execution Simulator. HERMES supports heterogeneous clients executing multiple models concurrently unlike prior frameworks. We explore the impact of reasoning stages on end-to-end latency, optimal strategies for hybrid pipelines, and the architectural implications of remote KV cache retrieval.
arXiv Detail & Related papers (2025-04-14T00:29:49Z)
Simulation Streams: A Programming Paradigm for Controlling Large Language Models and Building Complex Systems with Generative AI [3.3126968968429407]
Simulation Streams is a programming paradigm designed to efficiently control and leverage Large Language Models (LLMs) Our primary goal is to create a framework that harnesses the agentic abilities of LLMs while addressing their limitations in maintaining consistency.
arXiv Detail & Related papers (2025-01-30T16:38:03Z)
EpiCoder: Encompassing Diversity and Complexity in Code Generation [49.170195362149386]
We introduce a novel feature tree-based synthesis framework inspired by Abstract Syntax Trees (AST) Unlike AST, which captures syntactic structure of code, our framework models semantic relationships between code elements. We fine-tuned widely-used base models to create the EpiCoder series, achieving state-of-the-art performance at both the function and file levels.
arXiv Detail & Related papers (2025-01-08T18:58:15Z)
Commit0: Library Generation from Scratch [77.38414688148006]
Commit0 is a benchmark that challenges AI agents to write libraries from scratch. Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests. Commit0 also offers an interactive environment where models receive static analysis and execution feedback on the code they generate.
arXiv Detail & Related papers (2024-12-02T18:11:30Z)
Scalable, Tokenization-Free Diffusion Model Architectures with Efficient Initial Convolution and Fixed-Size Reusable Structures for On-Device Image Generation [0.0]
Vision Transformers and U-Net architectures have been widely adopted in the implementation of Diffusion Models. We propose an architecture that utilizes a fixed-size, reusable transformer block as a core structure. Our architecture is characterized by low complexity, token-free design, absence of positional embeddings, uniformity, and scalability.
arXiv Detail & Related papers (2024-11-09T08:58:57Z)
AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation [48.82264764771652]
We introduce AsCAN -- a hybrid architecture, combining both convolutional and transformer blocks. AsCAN supports a variety of tasks: recognition, segmentation, class-conditional image generation. We then scale the same architecture to solve a large-scale text-to-image task and show state-of-the-art performance.
arXiv Detail & Related papers (2024-11-07T18:43:17Z)
MILP-StuDio: MILP Instance Generation via Block Structure Decomposition [55.79888361191114]
Mixed-integer linear programming (MILP) is one of the most popular mathematical formulations with numerous applications. We propose a novel MILP generation framework, called Block Structure Decomposition (MILP-StuDio), to generate high-quality instances by preserving the block structures.
arXiv Detail & Related papers (2024-10-30T08:33:27Z)
CARLOS: An Open, Modular, and Scalable Simulation Framework for the Development and Testing of Software for C-ITS [0.0]
We propose CARLOS - an open, modular, and scalable simulation framework for the development and testing of software in C-ITS. We provide core building blocks for this framework and explain how it can be used and extended by the community. In our paper, we motivate the architecture by describing important design principles and showcasing three major use cases.
arXiv Detail & Related papers (2024-04-02T10:48:36Z)
Using the Abstract Computer Architecture Description Language to Model AI Hardware Accelerators [77.89070422157178]
Manufacturers of AI-integrated products face a critical challenge: selecting an accelerator that aligns with their product's performance requirements. The Abstract Computer Architecture Description Language (ACADL) is a concise formalization of computer architecture block diagrams. In this paper, we demonstrate how to use the ACADL to model AI hardware accelerators, use their ACADL description to map DNNs onto them, and explain the timing simulation semantics to gather performance results.
arXiv Detail & Related papers (2024-01-30T19:27:16Z)
LILO: Learning Interpretable Libraries by Compressing and Documenting Code [71.55208585024198]
We introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code. LILO combines LLM-guided program synthesis with recent algorithmic advances in automated from Stitch. We find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions.
arXiv Detail & Related papers (2023-10-30T17:55:02Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
Virtualization of Tiny Embedded Systems with a robust real-time capable and extensible Stack Virtual Machine REXAVM supporting Material-integrated Intelligent Systems and Tiny Machine Learning [0.0]
This paper shows and evaluates the suitability of the proposed VM architecture for operationally equivalent software and hardware (FPGA) implementations. In a holistic architecture approach, the VM specifically addresses digital signal processing and tiny machine learning.
arXiv Detail & Related papers (2023-02-17T17:13:35Z)
MLIR: A Compiler Infrastructure for the End of Moore's Law [14.795080852112083]
MLIR aims to address software fragmentation, improve compilation for heterogeneous hardware, and significantly reduce the cost of building domain specific compilers. MLIR facilitates the design and implementation of code generators, translators and translators at different levels of abstraction.
arXiv Detail & Related papers (2020-02-25T17:24:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.