Related papers: Tao: Re-Thinking DL-based Microarchitecture Simulation

Tao: Re-Thinking DL-based Microarchitecture Simulation

URL: http://arxiv.org/abs/2404.10921v2
Date: Mon, 29 Apr 2024 22:14:32 GMT
Title: Tao: Re-Thinking DL-based Microarchitecture Simulation
Authors: Santosh Pandey, Amir Yazdanbakhsh, Hang Liu,
Abstract summary: Existing microarchitecture simulators excel and fall short at different aspects. Deep learning (DL)-based simulations are remarkably fast and have acceptable accuracy but fail to provide adequate low-level microarchitectural performance metrics. This paper introduces TAO that redesigns the DL-based simulation with three primary contributions.
Score: 8.501776613988484
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Microarchitecture simulators are indispensable tools for microarchitecture designers to validate, estimate, and optimize new hardware that meets specific design requirements. While the quest for a fast, accurate and detailed microarchitecture simulation has been ongoing for decades, existing simulators excel and fall short at different aspects: (i) Although execution-driven simulation is accurate and detailed, it is extremely slow and requires expert-level experience to design. (ii) Trace-driven simulation reuses the execution traces in pursuit of fast simulation but faces accuracy concerns and fails to achieve significant speedup. (iii) Emerging deep learning (DL)-based simulations are remarkably fast and have acceptable accuracy but fail to provide adequate low-level microarchitectural performance metrics crucial for microarchitectural bottleneck analysis. Additionally, they introduce substantial overheads from trace regeneration and model re-training when simulating a new microarchitecture. Re-thinking the advantages and limitations of the aforementioned simulation paradigms, this paper introduces TAO that redesigns the DL-based simulation with three primary contributions: First, we propose a new training dataset design such that the subsequent simulation only needs functional trace as inputs, which can be rapidly generated and reused across microarchitectures. Second, we redesign the input features and the DL model using self-attention to support predicting various performance metrics. Third, we propose techniques to train a microarchitecture agnostic embedding layer that enables fast transfer learning between different microarchitectural configurations and reduces the re-training overhead of conventional DL-based simulators. Our extensive evaluation shows TAO can reduce the overall training and simulation time by 18.06x over the state-of-the-art DL-based endeavors.

Related papers

NeuralCFD: Deep Learning on High-Fidelity Automotive Aerodynamics Simulations [11.849142587216903]
Key challenges must be overcome before neural network-based simulation surrogates can be implemented at an industry scale. We introduce Geometry-preserving Universal Physics Transformer (GP-UPT), which separates geometry encoding and physics predictions. GP-UPT circumvents the creation of high-quality simulation meshes, enables accurate 3D velocity field predictions at 20 million mesh cells, and excels in transfer learning from low-fidelity to high-fidelity simulation datasets.
arXiv Detail & Related papers (2025-02-13T17:58:07Z)
Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives. We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis. We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z)
Bridging the Sim-to-Real Gap with Bayesian Inference [53.61496586090384]
We present SIM-FSVGD for learning robot dynamics from data. We use low-fidelity physical priors to regularize the training of neural network models. We demonstrate the effectiveness of SIM-FSVGD in bridging the sim-to-real gap on a high-performance RC racecar system.
arXiv Detail & Related papers (2024-03-25T11:29:32Z)
Accelerating Computer Architecture Simulation through Machine Learning [0.07252027234425332]
This paper presents our approach to accelerate computer architecture simulation by leveraging machine learning techniques. Our proposed model utilizes a combination of application features and micro-architectural features to predict the performance of an application. We demonstrate the effectiveness of our approach by building and evaluating a machine learning model that offers significant speedup in architectural exploration.
arXiv Detail & Related papers (2024-02-28T23:00:57Z)
Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research [76.93956925360638]
Waymax is a new data-driven simulator for autonomous driving in multi-agent scenes. It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training. We benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions.
arXiv Detail & Related papers (2023-10-12T20:49:15Z)
SimVPv2: Towards Simple yet Powerful Spatiotemporal Predictive Learning [61.419914155985886]
We propose SimVPv2, a streamlined model that eliminates the need for Unet architectures for spatial and temporal modeling. SimVPv2 not only simplifies the model architecture but also improves both performance and computational efficiency. On the standard Moving MNIST benchmark, SimVPv2 achieves superior performance compared to SimVP, with fewer FLOPs, about half the training time and 60% faster inference efficiency.
arXiv Detail & Related papers (2022-11-22T08:01:33Z)
Continual learning autoencoder training for a particle-in-cell simulation via streaming [52.77024349608834]
upcoming exascale era will provide a new generation of physics simulations with high resolution. These simulations will have a high resolution, which will impact the training of machine learning models since storing a high amount of simulation data on disk is nearly impossible. This work presents an approach that trains a neural network concurrently to a running simulation without data on a disk.
arXiv Detail & Related papers (2022-11-09T09:55:14Z)
Use of Multifidelity Training Data and Transfer Learning for Efficient Construction of Subsurface Flow Surrogate Models [0.0]
To construct data-driven surrogate models, several thousand high-fidelity simulation runs may be required to provide training samples. We present a framework where most of the training simulations are performed on coarsened geomodels. The network provides results that are significantly more accurate than the low-fidelity simulations used for most of the training.
arXiv Detail & Related papers (2022-04-23T20:09:49Z)
Data-Driven Offline Optimization For Architecting Hardware Accelerators [89.68870139177785]
We develop a data-driven offline optimization method for designing hardware accelerators, dubbed PRIME. PRIME improves performance upon state-of-the-art simulation-driven methods by about 1.54x and 1.20x, while considerably reducing the required total simulation time by 93% and 99%, respectively. In addition, PRIME also architects effective accelerators for unseen applications in a zero-shot setting, outperforming simulation-based methods by 1.26x.
arXiv Detail & Related papers (2021-10-20T17:06:09Z)
Opportunistic Emulation of Computationally Expensive Simulations via Deep Learning [9.13837510233406]
We investigate the use of deep neural networks for opportunistic model emulation of APSIM models. We focus on emulating four important outputs of the APSIM model: runoff, soil_loss, DINrunoff, Nleached.
arXiv Detail & Related papers (2021-08-25T05:57:16Z)
Deep Bayesian Active Learning for Accelerating Stochastic Simulation [74.58219903138301]
Interactive Neural Process (INP) is a deep active learning framework for simulations and with active learning approaches. For active learning, we propose a novel acquisition function, Latent Information Gain (LIG), calculated in the latent space of NP based models. The results demonstrate STNP outperforms the baselines in the learning setting and LIG achieves the state-of-the-art for active learning.
arXiv Detail & Related papers (2021-06-05T01:31:51Z)
SimNet: Computer Architecture Simulation using Machine Learning [3.7019798164954336]
This work describes a concerted effort, where machine learning (ML) is used to accelerate discrete-event simulation. A GPU-accelerated parallel simulator is implemented based on the proposed instruction latency predictor. Its simulation accuracy and throughput are validated and evaluated against a state-of-the-art simulator.
arXiv Detail & Related papers (2021-05-12T17:31:52Z)
STONNE: A Detailed Architectural Simulator for Flexible Neural Network Accelerators [5.326345912766044]
STONNE is a cycle-accurate, highly-modular and highly-extensible simulation framework. We show how it can closely approach the performance results of the publicly available BSV-coded MAERI implementation.
arXiv Detail & Related papers (2020-06-10T19:20:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.