Apollo: Transferable Architecture Exploration
- URL: http://arxiv.org/abs/2102.01723v1
- Date: Tue, 2 Feb 2021 19:36:02 GMT
- Title: Apollo: Transferable Architecture Exploration
- Authors: Amir Yazdanbakhsh, Christof Angermueller, Berkin Akin, Yanqi Zhou,
Albin Jones, Milad Hashemi, Kevin Swersky, Satrajit Chatterjee, Ravi
Narayanaswami, James Laudon
- Abstract summary: We propose a transferable architecture exploration framework, dubbed Apollo.
We show that our framework finds high reward design configurations more sample-efficiently than a baseline black-box optimization approach.
- Score: 26.489275442359464
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The looming end of Moore's Law and ascending use of deep learning drives the
design of custom accelerators that are optimized for specific neural
architectures. Architecture exploration for such accelerators forms a
challenging constrained optimization problem over a complex, high-dimensional,
and structured input space with a costly to evaluate objective function.
Existing approaches for accelerator design are sample-inefficient and do not
transfer knowledge between related optimizations tasks with different design
constraints, such as area and/or latency budget, or neural architecture
configurations. In this work, we propose a transferable architecture
exploration framework, dubbed Apollo, that leverages recent advances in
black-box function optimization for sample-efficient accelerator design. We use
this framework to optimize accelerator configurations of a diverse set of
neural architectures with alternative design constraints. We show that our
framework finds high reward design configurations (up to 24.6% speedup) more
sample-efficiently than a baseline black-box optimization approach. We further
show that by transferring knowledge between target architectures with different
design constraints, Apollo is able to find optimal configurations faster and
often with better objective value (up to 25% improvements). This encouraging
outcome portrays a promising path forward to facilitate generating higher
quality accelerators.
Related papers
- Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI.
As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios.
This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z) - Compositional Generative Inverse Design [69.22782875567547]
Inverse design, where we seek to design input variables in order to optimize an underlying objective function, is an important problem.
We show that by instead optimizing over the learned energy function captured by the diffusion model, we can avoid such adversarial examples.
In an N-body interaction task and a challenging 2D multi-airfoil design task, we demonstrate that by composing the learned diffusion model at test time, our method allows us to design initial states and boundary shapes.
arXiv Detail & Related papers (2024-01-24T01:33:39Z) - MetaML: Automating Customizable Cross-Stage Design-Flow for Deep
Learning Acceleration [5.2487252195308844]
This paper introduces a novel optimization framework for deep neural network (DNN) hardware accelerators.
We introduce novel optimization and transformation tasks for building design-flow architectures.
Our results demonstrate considerable reductions of up to 92% in DSP usage and 89% in LUT usage for two networks.
arXiv Detail & Related papers (2023-06-14T21:06:07Z) - AutoML for neuromorphic computing and application-driven co-design:
asynchronous, massively parallel optimization of spiking architectures [3.8937756915387505]
We have extended AutoML inspired approaches to the exploration and optimization of neuromorphic architectures.
We are able to efficiently explore the configuration space of neuromorphic architectures and identify the subset of conditions leading to the highest performance.
arXiv Detail & Related papers (2023-02-26T02:26:45Z) - A Semi-Decoupled Approach to Fast and Optimal Hardware-Software
Co-Design of Neural Accelerators [22.69558355718029]
Hardware-software co-design has been emerging to fully reap the benefits of flexible design spaces and optimize neural network performance.
Such co-design enlarges the total search space to practically infinity and presents substantial challenges.
We propose a emphsemi-decoupled approach to reduce the size of the total design space by orders of magnitude, yet without losing optimality.
arXiv Detail & Related papers (2022-03-25T21:49:42Z) - Data-Driven Offline Optimization For Architecting Hardware Accelerators [89.68870139177785]
We develop a data-driven offline optimization method for designing hardware accelerators, dubbed PRIME.
PRIME improves performance upon state-of-the-art simulation-driven methods by about 1.54x and 1.20x, while considerably reducing the required total simulation time by 93% and 99%, respectively.
In addition, PRIME also architects effective accelerators for unseen applications in a zero-shot setting, outperforming simulation-based methods by 1.26x.
arXiv Detail & Related papers (2021-10-20T17:06:09Z) - A Construction Kit for Efficient Low Power Neural Network Accelerator
Designs [11.807678100385164]
This work provides a survey of neural network accelerator optimization approaches that have been used in recent works.
It presents the list of optimizations and their quantitative effects as a construction kit, allowing to assess the design choices for each building block separately.
arXiv Detail & Related papers (2021-06-24T07:53:56Z) - iDARTS: Differentiable Architecture Search with Stochastic Implicit
Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS)
We tackle the hypergradient computation in DARTS based on the implicit function theorem.
We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z) - Towards Accurate and Compact Architectures via Neural Architecture
Transformer [95.4514639013144]
It is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computational cost.
We have proposed a Neural Architecture Transformer (NAT) method which casts the optimization problem into a Markov Decision Process (MDP)
We propose a Neural Architecture Transformer++ (NAT++) method which further enlarges the set of candidate transitions to improve the performance of architecture optimization.
arXiv Detail & Related papers (2021-02-20T09:38:10Z) - Evolving Search Space for Neural Architecture Search [70.71153433676024]
We present a Neural Search-space Evolution (NSE) scheme that amplifies the results from the previous effort by maintaining an optimized search space subset.
We achieve 77.3% top-1 retrain accuracy on ImageNet with 333M FLOPs, which yielded a state-of-the-art performance.
When the latency constraint is adopted, our result also performs better than the previous best-performing mobile models with a 77.9% Top-1 retrain accuracy.
arXiv Detail & Related papers (2020-11-22T01:11:19Z) - Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM
Compiler Stack [1.8337659614890698]
Specialized accelerators for tensor-operations, such as blocked-matrix operations and multi-dimensional convolutions, have been emerged as powerful architecture choices for Deep-Learning computing.
The rapid development of frameworks, models, and precision options challenges the adaptability of such tensor-accelerators.
programmable tensor accelerators offer a promising alternative by allowing reconfiguration of a virtual architecture that overlays on top of the physical FPGA fabric.
arXiv Detail & Related papers (2020-04-20T10:12:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.