Related papers: Apollo: Transferable Architecture Exploration

Apollo: Transferable Architecture Exploration

URL: http://arxiv.org/abs/2102.01723v1
Date: Tue, 2 Feb 2021 19:36:02 GMT
Title: Apollo: Transferable Architecture Exploration
Authors: Amir Yazdanbakhsh, Christof Angermueller, Berkin Akin, Yanqi Zhou, Albin Jones, Milad Hashemi, Kevin Swersky, Satrajit Chatterjee, Ravi Narayanaswami, James Laudon
Abstract summary: We propose a transferable architecture exploration framework, dubbed Apollo. We show that our framework finds high reward design configurations more sample-efficiently than a baseline black-box optimization approach.
Score: 26.489275442359464
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The looming end of Moore's Law and ascending use of deep learning drives the design of custom accelerators that are optimized for specific neural architectures. Architecture exploration for such accelerators forms a challenging constrained optimization problem over a complex, high-dimensional, and structured input space with a costly to evaluate objective function. Existing approaches for accelerator design are sample-inefficient and do not transfer knowledge between related optimizations tasks with different design constraints, such as area and/or latency budget, or neural architecture configurations. In this work, we propose a transferable architecture exploration framework, dubbed Apollo, that leverages recent advances in black-box function optimization for sample-efficient accelerator design. We use this framework to optimize accelerator configurations of a diverse set of neural architectures with alternative design constraints. We show that our framework finds high reward design configurations (up to 24.6% speedup) more sample-efficiently than a baseline black-box optimization approach. We further show that by transferring knowledge between target architectures with different design constraints, Apollo is able to find optimal configurations faster and often with better objective value (up to 25% improvements). This encouraging outcome portrays a promising path forward to facilitate generating higher quality accelerators.

Related papers

Striving for Faster and Better: A One-Layer Architecture with Auto Re-parameterization for Low-Light Image Enhancement [50.93686436282772]
We aim to delve into the limits of image enhancers both from visual quality and computational efficiency. By rethinking the task demands, we build an explicit connection, i.e., visual quality and computational efficiency are corresponding to model learning and structure design. Ultimately, this achieves efficient low-light image enhancement using only a single convolutional layer, while maintaining excellent visual quality.
arXiv Detail & Related papers (2025-02-27T08:20:03Z)
Accelerated Gradient-based Design Optimization Via Differentiable Physics-Informed Neural Operator: A Composites Autoclave Processing Case Study [0.0]
We introduce a novel Physics-Informed DeepONet (PIDON) architecture to effectively model the nonlinear behavior of complex engineering systems. We demonstrate the effectiveness of this framework in the optimization of aerospace-grade composites curing processes achieving a 3x speedup. The proposed model has the potential to be used as a scalable and efficient optimization tool for broader applications in advanced engineering and digital twin systems.
arXiv Detail & Related papers (2025-02-17T07:11:46Z)
Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI. As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios. This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z)
Compositional Generative Inverse Design [69.22782875567547]
Inverse design, where we seek to design input variables in order to optimize an underlying objective function, is an important problem. We show that by instead optimizing over the learned energy function captured by the diffusion model, we can avoid such adversarial examples. In an N-body interaction task and a challenging 2D multi-airfoil design task, we demonstrate that by composing the learned diffusion model at test time, our method allows us to design initial states and boundary shapes.
arXiv Detail & Related papers (2024-01-24T01:33:39Z)
MetaML: Automating Customizable Cross-Stage Design-Flow for Deep Learning Acceleration [5.2487252195308844]
This paper introduces a novel optimization framework for deep neural network (DNN) hardware accelerators. We introduce novel optimization and transformation tasks for building design-flow architectures. Our results demonstrate considerable reductions of up to 92% in DSP usage and 89% in LUT usage for two networks.
arXiv Detail & Related papers (2023-06-14T21:06:07Z)
AutoML for neuromorphic computing and application-driven co-design: asynchronous, massively parallel optimization of spiking architectures [3.8937756915387505]
We have extended AutoML inspired approaches to the exploration and optimization of neuromorphic architectures. We are able to efficiently explore the configuration space of neuromorphic architectures and identify the subset of conditions leading to the highest performance.
arXiv Detail & Related papers (2023-02-26T02:26:45Z)
A Semi-Decoupled Approach to Fast and Optimal Hardware-Software Co-Design of Neural Accelerators [22.69558355718029]
Hardware-software co-design has been emerging to fully reap the benefits of flexible design spaces and optimize neural network performance. Such co-design enlarges the total search space to practically infinity and presents substantial challenges. We propose a emphsemi-decoupled approach to reduce the size of the total design space by orders of magnitude, yet without losing optimality.
arXiv Detail & Related papers (2022-03-25T21:49:42Z)
Data-Driven Offline Optimization For Architecting Hardware Accelerators [89.68870139177785]
We develop a data-driven offline optimization method for designing hardware accelerators, dubbed PRIME. PRIME improves performance upon state-of-the-art simulation-driven methods by about 1.54x and 1.20x, while considerably reducing the required total simulation time by 93% and 99%, respectively. In addition, PRIME also architects effective accelerators for unseen applications in a zero-shot setting, outperforming simulation-based methods by 1.26x.
arXiv Detail & Related papers (2021-10-20T17:06:09Z)
A Construction Kit for Efficient Low Power Neural Network Accelerator Designs [11.807678100385164]
This work provides a survey of neural network accelerator optimization approaches that have been used in recent works. It presents the list of optimizations and their quantitative effects as a construction kit, allowing to assess the design choices for each building block separately.
arXiv Detail & Related papers (2021-06-24T07:53:56Z)
iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS) We tackle the hypergradient computation in DARTS based on the implicit function theorem. We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z)
Towards Accurate and Compact Architectures via Neural Architecture Transformer [95.4514639013144]
It is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computational cost. We have proposed a Neural Architecture Transformer (NAT) method which casts the optimization problem into a Markov Decision Process (MDP) We propose a Neural Architecture Transformer++ (NAT++) method which further enlarges the set of candidate transitions to improve the performance of architecture optimization.
arXiv Detail & Related papers (2021-02-20T09:38:10Z)
Evolving Search Space for Neural Architecture Search [70.71153433676024]
We present a Neural Search-space Evolution (NSE) scheme that amplifies the results from the previous effort by maintaining an optimized search space subset. We achieve 77.3% top-1 retrain accuracy on ImageNet with 333M FLOPs, which yielded a state-of-the-art performance. When the latency constraint is adopted, our result also performs better than the previous best-performing mobile models with a 77.9% Top-1 retrain accuracy.
arXiv Detail & Related papers (2020-11-22T01:11:19Z)
Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM Compiler Stack [1.8337659614890698]
Specialized accelerators for tensor-operations, such as blocked-matrix operations and multi-dimensional convolutions, have been emerged as powerful architecture choices for Deep-Learning computing. The rapid development of frameworks, models, and precision options challenges the adaptability of such tensor-accelerators. programmable tensor accelerators offer a promising alternative by allowing reconfiguration of a virtual architecture that overlays on top of the physical FPGA fabric.
arXiv Detail & Related papers (2020-04-20T10:12:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.