Compiler-Driven Simulation of Reconfigurable Hardware Accelerators
- URL: http://arxiv.org/abs/2202.00739v1
- Date: Tue, 1 Feb 2022 20:31:04 GMT
- Title: Compiler-Driven Simulation of Reconfigurable Hardware Accelerators
- Authors: Zhijing Li, Yuwei Ye, Stephen Neuendorffer, Adrian Sampso
- Abstract summary: Existing simulators tend to two extremes: low-level and general approaches, such as RTL simulation, that can model any hardware but require substantial effort and long execution times.
This work proposes a compiler-driven simulation workflow that can model hardware accelerator.
- Score: 0.8807375890824978
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: As customized accelerator design has become increasingly popular to keep up
with the demand for high performance computing, it poses challenges for modern
simulator design to adapt to such a large variety of accelerators. Existing
simulators tend to two extremes: low-level and general approaches, such as RTL
simulation, that can model any hardware but require substantial effort and long
execution times; and higher-level application-specific models that can be much
faster and easier to use but require one-off engineering effort.
This work proposes a compiler-driven simulation workflow that can model
configurable hardware accelerator. The key idea is to separate structure
representation from simulation by developing an intermediate language that can
flexibly represent a wide variety of hardware constructs. We design the Event
Queue (EQueue) dialect of MLIR, a dialect that can model arbitrary hardware
accelerators with explicit data movement and distributed event-based control;
we also implement a generic simulation engine to model EQueue programs with
hybrid MLIR dialects representing different abstraction levels. We demonstrate
two case studies of EQueue-implemented accelerators: the systolic array of
convolution and SIMD processors in a modern FPGA. In the former we show EQueue
simulation is as accurate as a state-of-the-art simulator, while offering
higher extensibility and lower iteration cost via compiler passes. In the
latter we demonstrate our simulation flow can guide designer efficiently
improve their design using visualizable simulation outputs.
Related papers
- Tao: Re-Thinking DL-based Microarchitecture Simulation [8.501776613988484]
Existing microarchitecture simulators excel and fall short at different aspects.
Deep learning (DL)-based simulations are remarkably fast and have acceptable accuracy but fail to provide adequate low-level microarchitectural performance metrics.
This paper introduces TAO that redesigns the DL-based simulation with three primary contributions.
arXiv Detail & Related papers (2024-04-16T21:45:10Z) - CityFlowER: An Efficient and Realistic Traffic Simulator with Embedded
Machine Learning Models [25.567208505574072]
CityFlowER is an advanced simulator for efficient and realistic city-wide traffic simulation.
It pre-embeds Machine Learning models within the simulator, eliminating the need for external API interactions.
It offers unparalleled flexibility and efficiency, particularly in large-scale simulations.
arXiv Detail & Related papers (2024-02-09T01:19:41Z) - Design-Space Exploration of SNN Models using Application-Specific Multi-Core Architectures [0.3599866690398789]
"RAVSim" is a cutting-edge SNN simulator, developed using and it is publicly available on their website as an official module.
RAVSim is a runtime virtual simulation environment that enables the user to interact with the model, observe its behavior of output concentration, and modify the set of parametric values at any time while the simulation is in execution.
arXiv Detail & Related papers (2024-02-07T20:41:00Z) - Using the Abstract Computer Architecture Description Language to Model
AI Hardware Accelerators [77.89070422157178]
Manufacturers of AI-integrated products face a critical challenge: selecting an accelerator that aligns with their product's performance requirements.
The Abstract Computer Architecture Description Language (ACADL) is a concise formalization of computer architecture block diagrams.
In this paper, we demonstrate how to use the ACADL to model AI hardware accelerators, use their ACADL description to map DNNs onto them, and explain the timing simulation semantics to gather performance results.
arXiv Detail & Related papers (2024-01-30T19:27:16Z) - DEAP: Design Space Exploration for DNN Accelerator Parallelism [0.0]
Large Language Models (LLMs) are becoming increasingly complex and powerful to train and serve.
This paper showcases how hardware and software co-design can come together and allow us to create customized hardware systems.
arXiv Detail & Related papers (2023-12-24T02:43:01Z) - Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous
Driving Research [76.93956925360638]
Waymax is a new data-driven simulator for autonomous driving in multi-agent scenes.
It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training.
We benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions.
arXiv Detail & Related papers (2023-10-12T20:49:15Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - Data-Driven Offline Optimization For Architecting Hardware Accelerators [89.68870139177785]
We develop a data-driven offline optimization method for designing hardware accelerators, dubbed PRIME.
PRIME improves performance upon state-of-the-art simulation-driven methods by about 1.54x and 1.20x, while considerably reducing the required total simulation time by 93% and 99%, respectively.
In addition, PRIME also architects effective accelerators for unseen applications in a zero-shot setting, outperforming simulation-based methods by 1.26x.
arXiv Detail & Related papers (2021-10-20T17:06:09Z) - SimNet: Computer Architecture Simulation using Machine Learning [3.7019798164954336]
This work describes a concerted effort, where machine learning (ML) is used to accelerate discrete-event simulation.
A GPU-accelerated parallel simulator is implemented based on the proposed instruction latency predictor.
Its simulation accuracy and throughput are validated and evaluated against a state-of-the-art simulator.
arXiv Detail & Related papers (2021-05-12T17:31:52Z) - High-performance symbolic-numerics via multiple dispatch [52.77024349608834]
Symbolics.jl is an extendable symbolic system which uses dynamic multiple dispatch to change behavior depending on the domain needs.
We show that by formalizing a generic API on actions independent of implementation, we can retroactively add optimized data structures to our system.
We demonstrate the ability to swap between classical term-rewriting simplifiers and e-graph-based term-rewriting simplifiers.
arXiv Detail & Related papers (2021-05-09T14:22:43Z) - Learning Discrete Energy-based Models via Auxiliary-variable Local
Exploration [130.89746032163106]
We propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data.
We show that the energy function and sampler can be trained efficiently via a new variational form of power iteration.
We present an energy model guided fuzzer for software testing that achieves comparable performance to well engineered fuzzing engines like libfuzzer.
arXiv Detail & Related papers (2020-11-10T19:31:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.