On the Viability of using LLMs for SW/HW Co-Design: An Example in
Designing CiM DNN Accelerators
- URL: http://arxiv.org/abs/2306.06923v1
- Date: Mon, 12 Jun 2023 07:50:53 GMT
- Title: On the Viability of using LLMs for SW/HW Co-Design: An Example in
Designing CiM DNN Accelerators
- Authors: Zheyu Yan, Yifan Qin, Xiaobo Sharon Hu, Yiyu Shi
- Abstract summary: Deep Neural Networks (DNNs) have demonstrated impressive performance across a wide range of tasks.
deploying DNNs on edge devices poses significant challenges due to stringent power and computational budgets.
We present a novel approach that leverages Large Language Models (LLMs) to address this issue.
- Score: 14.02304927398616
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep Neural Networks (DNNs) have demonstrated impressive performance across a
wide range of tasks. However, deploying DNNs on edge devices poses significant
challenges due to stringent power and computational budgets. An effective
solution to this issue is software-hardware (SW-HW) co-design, which allows for
the tailored creation of DNN models and hardware architectures that optimally
utilize available resources. However, SW-HW co-design traditionally suffers
from slow optimization speeds because their optimizers do not make use of
heuristic knowledge, also known as the ``cold start'' problem. In this study,
we present a novel approach that leverages Large Language Models (LLMs) to
address this issue. By utilizing the abundant knowledge of pre-trained LLMs in
the co-design optimization process, we effectively bypass the cold start
problem, substantially accelerating the design process. The proposed method
achieves a significant speedup of 25x. This advancement paves the way for the
rapid and efficient deployment of DNNs on edge devices.
Related papers
- MetaML-Pro: Cross-Stage Design Flow Automation for Efficient Deep Learning Acceleration [8.43012094714496]
This paper presents a unified framework for codifying and automating optimization strategies to deploy deep neural networks (DNNs) on resource-constrained hardware.
Our novel approach addresses two key issues: cross-stage co-optimization and optimization search.
Experimental results demonstrate up to a 92% DSP and 89% LUT usage reduction for select networks.
arXiv Detail & Related papers (2025-02-09T11:02:06Z) - DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach [49.56404236394601]
We formulate the problem of joint DNN partitioning, task offloading, and resource allocation in Vehicular Edge Computing.
Our objective is to minimize the DNN-based task completion time while guaranteeing the system stability over time.
We propose a Multi-Agent Diffusion-based Deep Reinforcement Learning (MAD2RL) algorithm, incorporating the innovative use of diffusion models.
arXiv Detail & Related papers (2024-06-11T06:31:03Z) - AutoHLS: Learning to Accelerate Design Space Exploration for HLS Designs [10.690389829735661]
This paper proposes a novel framework called AutoHLS, which integrates a deep neural network (DNN) with Bayesian optimization (BO) to accelerate HLS hardware design optimization.
Our experimental results demonstrate up to a 70-fold speedup in exploration time.
arXiv Detail & Related papers (2024-03-15T21:14:44Z) - LitE-SNN: Designing Lightweight and Efficient Spiking Neural Network through Spatial-Temporal Compressive Network Search and Joint Optimization [48.41286573672824]
Spiking Neural Networks (SNNs) mimic the information-processing mechanisms of the human brain and are highly energy-efficient.
We propose a new approach named LitE-SNN that incorporates both spatial and temporal compression into the automated network design process.
arXiv Detail & Related papers (2024-01-26T05:23:11Z) - MetaML: Automating Customizable Cross-Stage Design-Flow for Deep
Learning Acceleration [5.2487252195308844]
This paper introduces a novel optimization framework for deep neural network (DNN) hardware accelerators.
We introduce novel optimization and transformation tasks for building design-flow architectures.
Our results demonstrate considerable reductions of up to 92% in DSP usage and 89% in LUT usage for two networks.
arXiv Detail & Related papers (2023-06-14T21:06:07Z) - DeepAxe: A Framework for Exploration of Approximation and Reliability
Trade-offs in DNN Accelerators [0.9556128246747769]
The role of Deep Neural Networks (DNNs) in safety-critical applications is expanding.
DNNs experience massive growth in terms of computation power.
It raises the necessity of improving the reliability of DNN accelerators.
arXiv Detail & Related papers (2023-03-14T20:42:38Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - CoSA: Scheduling by Constrained Optimization for Spatial Accelerators [1.9149970150912705]
We present CoSA, a constrained-optimization-based approach for scheduling Deep Neural Networks (DNNs) accelerators.
As opposed to existing approaches that either rely on designers's or iterative methods to navigate the search space, CoSA expresses scheduling decisions as a constrained-optimization problem.
We demonstrate that CoSA-generated schedules significantly outperform state-of-the-art approaches by a geometric mean of up to 2.5x.
arXiv Detail & Related papers (2021-05-05T07:17:25Z) - HAPI: Hardware-Aware Progressive Inference [18.214367595727037]
Convolutional neural networks (CNNs) have recently become the state-of-the-art in a diversity of AI tasks.
Despite their popularity, CNN inference still comes at a high computational cost.
This work presents HAPI, a novel methodology for generating high-performance early-exit networks.
arXiv Detail & Related papers (2020-08-10T09:55:18Z) - Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning
and Compiler Optimization [56.3111706960878]
High-end mobile platforms serve as primary computing devices for a wide range of Deep Neural Network (DNN) applications.
constrained computation and storage resources on these devices pose significant challenges for real-time inference executions.
We propose a set of hardware-friendly structured model pruning and compiler optimization techniques to accelerate DNN executions on mobile devices.
arXiv Detail & Related papers (2020-04-22T03:18:23Z) - An Image Enhancing Pattern-based Sparsity for Real-time Inference on
Mobile Devices [58.62801151916888]
We introduce a new sparsity dimension, namely pattern-based sparsity that comprises pattern and connectivity sparsity, and becoming both highly accurate and hardware friendly.
Our approach on the new pattern-based sparsity naturally fits into compiler optimization for highly efficient DNN execution on mobile platforms.
arXiv Detail & Related papers (2020-01-20T16:17:36Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.