Data-Driven Offline Optimization For Architecting Hardware Accelerators
- URL: http://arxiv.org/abs/2110.11346v1
- Date: Wed, 20 Oct 2021 17:06:09 GMT
- Title: Data-Driven Offline Optimization For Architecting Hardware Accelerators
- Authors: Aviral Kumar, Amir Yazdanbakhsh, Milad Hashemi, Kevin Swersky, Sergey
Levine
- Abstract summary: We develop a data-driven offline optimization method for designing hardware accelerators, dubbed PRIME.
PRIME improves performance upon state-of-the-art simulation-driven methods by about 1.54x and 1.20x, while considerably reducing the required total simulation time by 93% and 99%, respectively.
In addition, PRIME also architects effective accelerators for unseen applications in a zero-shot setting, outperforming simulation-based methods by 1.26x.
- Score: 89.68870139177785
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Industry has gradually moved towards application-specific hardware
accelerators in order to attain higher efficiency. While such a paradigm shift
is already starting to show promising results, designers need to spend
considerable manual effort and perform a large number of time-consuming
simulations to find accelerators that can accelerate multiple target
applications while obeying design constraints. Moreover, such a
"simulation-driven" approach must be re-run from scratch every time the set of
target applications or design constraints change. An alternative paradigm is to
use a "data-driven", offline approach that utilizes logged simulation data, to
architect hardware accelerators, without needing any form of simulations. Such
an approach not only alleviates the need to run time-consuming simulation, but
also enables data reuse and applies even when set of target applications
changes. In this paper, we develop such a data-driven offline optimization
method for designing hardware accelerators, dubbed PRIME, that enjoys all of
these properties. Our approach learns a conservative, robust estimate of the
desired cost function, utilizes infeasible points, and optimizes the design
against this estimate without any additional simulator queries during
optimization. PRIME architects accelerators -- tailored towards both single and
multiple applications -- improving performance upon state-of-the-art
simulation-driven methods by about 1.54x and 1.20x, while considerably reducing
the required total simulation time by 93% and 99%, respectively. In addition,
PRIME also architects effective accelerators for unseen applications in a
zero-shot setting, outperforming simulation-based methods by 1.26x.
Related papers
- INSIGHT: Universal Neural Simulator for Analog Circuits Harnessing Autoregressive Transformers [13.94505840368669]
INSIGHT is an effective universal neural simulator in the analog front-end design automation loop.
It accurately predicts the performance metrics of analog circuits with just a few microseconds of inference time.
arXiv Detail & Related papers (2024-07-10T03:52:53Z) - Tao: Re-Thinking DL-based Microarchitecture Simulation [8.501776613988484]
Existing microarchitecture simulators excel and fall short at different aspects.
Deep learning (DL)-based simulations are remarkably fast and have acceptable accuracy but fail to provide adequate low-level microarchitectural performance metrics.
This paper introduces TAO that redesigns the DL-based simulation with three primary contributions.
arXiv Detail & Related papers (2024-04-16T21:45:10Z) - Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.
DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.
Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z) - Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous
Driving Research [76.93956925360638]
Waymax is a new data-driven simulator for autonomous driving in multi-agent scenes.
It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training.
We benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions.
arXiv Detail & Related papers (2023-10-12T20:49:15Z) - Surrogate Neural Networks for Efficient Simulation-based Trajectory
Planning Optimization [28.292234483886947]
This paper presents a novel methodology that uses surrogate models in the form of neural networks to reduce the computation time of simulation-based optimization of a reference trajectory.
We find a 74% better-performing reference trajectory compared to nominal, and the numerical results clearly show a substantial reduction in computation time for designing future trajectories.
arXiv Detail & Related papers (2023-03-30T15:44:30Z) - TransCODE: Co-design of Transformers and Accelerators for Efficient
Training and Inference [6.0093441900032465]
We propose a framework that simulates transformer inference and training on a design space of accelerators.
We use this simulator in conjunction with the proposed co-design technique, called TransCODE, to obtain the best-performing models.
The obtained transformer-accelerator pair achieves 0.3% higher accuracy than the state-of-the-art pair.
arXiv Detail & Related papers (2023-03-27T02:45:18Z) - Learning Performance-Improving Code Edits [107.21538852090208]
We introduce a framework for adapting large language models (LLMs) to high-level program optimization.
First, we curate a dataset of performance-improving edits made by human programmers of over 77,000 competitive C++ programming submission pairs.
For prompting, we propose retrieval-based few-shot prompting and chain-of-thought, and for finetuning, these include performance-conditioned generation and synthetic data augmentation based on self-play.
arXiv Detail & Related papers (2023-02-15T18:59:21Z) - A Construction Kit for Efficient Low Power Neural Network Accelerator
Designs [11.807678100385164]
This work provides a survey of neural network accelerator optimization approaches that have been used in recent works.
It presents the list of optimizations and their quantitative effects as a construction kit, allowing to assess the design choices for each building block separately.
arXiv Detail & Related papers (2021-06-24T07:53:56Z) - Large Batch Simulation for Deep Reinforcement Learning [101.01408262583378]
We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work.
We realize end-to-end training speeds of over 19,000 frames of experience per second on a single and up to 72,000 frames per second on a single eight- GPU machine.
By combining batch simulation and performance optimizations, we demonstrate that Point navigation agents can be trained in complex 3D environments on a single GPU in 1.5 days to 97% of the accuracy of agents trained on a prior state-of-the-art system.
arXiv Detail & Related papers (2021-03-12T00:22:50Z) - AutoSimulate: (Quickly) Learning Synthetic Data Generation [70.82315853981838]
We propose an efficient alternative for optimal synthetic data generation based on a novel differentiable approximation of the objective.
We demonstrate that the proposed method finds the optimal data distribution faster (up to $50times$), with significantly reduced training data generation (up to $30times$) and better accuracy ($+8.7%$) on real-world test datasets than previous methods.
arXiv Detail & Related papers (2020-08-16T11:36:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.