Related papers: AIRCHITECT v2: Learning the Hardware Accelerator Design Space through Unified Representations

AIRCHITECT v2: Learning the Hardware Accelerator Design Space through Unified Representations

URL: http://arxiv.org/abs/2501.09954v1
Date: Fri, 17 Jan 2025 04:57:42 GMT
Title: AIRCHITECT v2: Learning the Hardware Accelerator Design Space through Unified Representations
Authors: Jamin Seo, Akshat Ramachandran, Yu-Chuan Chuang, Anirudh Itagi, Tushar Krishna,
Abstract summary: Design space exploration plays a crucial role in enabling custom hardware architectures.<n>Recently, AIrchitect v1, the first attempt to address the limitations of DSE into a search-time classification problem.
Score: 3.6231171463908938
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Design space exploration (DSE) plays a crucial role in enabling custom hardware architectures, particularly for emerging applications like AI, where optimized and specialized designs are essential. With the growing complexity of deep neural networks (DNNs) and the introduction of advanced foundational models (FMs), the design space for DNN accelerators is expanding at an exponential rate. Additionally, this space is highly non-uniform and non-convex, making it increasingly difficult to navigate and optimize. Traditional DSE techniques rely on search-based methods, which involve iterative sampling of the design space to find the optimal solution. However, this process is both time-consuming and often fails to converge to the global optima for such design spaces. Recently, AIrchitect v1, the first attempt to address the limitations of search-based techniques, transformed DSE into a constant-time classification problem using recommendation networks. In this work, we propose AIrchitect v2, a more accurate and generalizable learning-based DSE technique applicable to large-scale design spaces that overcomes the shortcomings of earlier approaches. Specifically, we devise an encoder-decoder transformer model that (a) encodes the complex design space into a uniform intermediate representation using contrastive learning and (b) leverages a novel unified representation blending the advantages of classification and regression to effectively explore the large DSE space without sacrificing accuracy. Experimental results evaluated on 10^5 real DNN workloads demonstrate that, on average, AIrchitect v2 outperforms existing techniques by 15% in identifying optimal design points. Furthermore, to demonstrate the generalizability of our method, we evaluate performance on unseen model workloads (LLMs) and attain a 1.7x improvement in inference latency on the identified hardware architecture.

Related papers

Neural Architecture Codesign for Fast Physics Applications [0.8692847090818803]
We develop a pipeline to streamline neural architecture codesign for physics applications. We employ neural architecture search and network compression in a two-stage approach to discover hardware efficient models.
arXiv Detail & Related papers (2025-01-09T19:00:03Z)
Multi-objective Optimization in CPU Design Space Exploration: Attention is All You Need [11.931035726174906]
Design Space Exploration (DSE) is essential to modern CPU design, yet current frameworks struggle to scale and generalize in high-dimensional architectural spaces.<n>We present textbfAttentionDSE, the first end-to-end DSE framework that integrates performance prediction and neurally design guidance through an attention-based architecture.<n>Key innovations include a textbfPerception-Driven Attention mechanism that exploits architectural hierarchy and locality, scaling attention complexity from $mathcalO(n2)$ to $mathcalO(n)$ via sliding windows
arXiv Detail & Related papers (2024-10-24T02:20:17Z)
RLEEGNet: Integrating Brain-Computer Interfaces with Adaptive AI for Intuitive Responsiveness and High-Accuracy Motor Imagery Classification [0.0]
We introduce a framework that leverages Reinforcement Learning with Deep Q-Networks (DQN) for classification tasks. We present a preprocessing technique for multiclass motor imagery (MI) classification in a One-Versus-The-Rest (OVR) manner. The integration of DQN with a 1D-CNN-LSTM architecture optimize the decision-making process in real-time.
arXiv Detail & Related papers (2024-02-09T02:03:13Z)
Compositional Generative Inverse Design [69.22782875567547]
Inverse design, where we seek to design input variables in order to optimize an underlying objective function, is an important problem. We show that by instead optimizing over the learned energy function captured by the diffusion model, we can avoid such adversarial examples. In an N-body interaction task and a challenging 2D multi-airfoil design task, we demonstrate that by composing the learned diffusion model at test time, our method allows us to design initial states and boundary shapes.
arXiv Detail & Related papers (2024-01-24T01:33:39Z)
End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z)
Searching a High-Performance Feature Extractor for Text Recognition Network [92.12492627169108]
We design a domain-specific search space by exploring principles for having good feature extractors. As the space is huge and complexly structured, no existing NAS algorithms can be applied. We propose a two-stage algorithm to effectively search in the space.
arXiv Detail & Related papers (2022-09-27T03:49:04Z)
FreeREA: Training-Free Evolution-based Architecture Search [17.202375422110553]
FreeREA is a custom cell-based evolution NAS algorithm that exploits an optimised combination of training-free metrics to rank architectures. Our experiments, carried out on the common benchmarks NAS-Bench-101 and NATS-Bench, demonstrate that i) FreeREA is a fast, efficient, and effective search method for models automatic design.
arXiv Detail & Related papers (2022-06-17T11:16:28Z)
A Graph Deep Learning Framework for High-Level Synthesis Design Space Exploration [11.154086943903696]
High-Level Synthesis is a solution for fast prototyping application-specific hardware. We propose HLS, for the first time in the literature, graph neural networks that jointly predict acceleration performance and hardware costs. We show that our approach achieves prediction accuracy comparable with that of commonly used simulators.
arXiv Detail & Related papers (2021-11-29T18:17:45Z)
iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS) We tackle the hypergradient computation in DARTS based on the implicit function theorem. We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z)
Evolving Search Space for Neural Architecture Search [70.71153433676024]
We present a Neural Search-space Evolution (NSE) scheme that amplifies the results from the previous effort by maintaining an optimized search space subset. We achieve 77.3% top-1 retrain accuracy on ImageNet with 333M FLOPs, which yielded a state-of-the-art performance. When the latency constraint is adopted, our result also performs better than the previous best-performing mobile models with a 77.9% Top-1 retrain accuracy.
arXiv Detail & Related papers (2020-11-22T01:11:19Z)
Automated Design Space Exploration for optimised Deployment of DNN on Arm Cortex-A CPUs [13.628734116014819]
Deep learning on embedded devices has prompted the development of numerous methods to optimise the deployment of deep neural networks (DNN) There is a lack of research on cross-level optimisation as the space of approaches becomes too large to test and obtain a globally optimised solution. We present a set of results for state-of-the-art DNNs on a range of Arm Cortex-A CPU platforms achieving up to 4x improvement in performance and over 2x reduction in memory.
arXiv Detail & Related papers (2020-06-09T11:00:06Z)
An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices [58.62801151916888]
We introduce a new sparsity dimension, namely pattern-based sparsity that comprises pattern and connectivity sparsity, and becoming both highly accurate and hardware friendly. Our approach on the new pattern-based sparsity naturally fits into compiler optimization for highly efficient DNN execution on mobile platforms.
arXiv Detail & Related papers (2020-01-20T16:17:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.